Manage and validate enumerations for attributes as controlled content
rather than as part of the DTD or XSD document design.
Longer description
The problem: The list
of values for an attribute is determined by the subject matter of the content.
For instance, content for a software design tool will need Architect and Developer
values for the audience attribute. By contrast, content for a pharmaceutical
trial reporting might need Researcher and Executive values for the same attribute.
In short, content creators need to be able to extend value enumerations and
share value enumerations with their content.
In traditional approaches,
the DTD or XSD defines controlled values as an enumerated list for an attribute.
This approach, however, is undesireable for several reasons:
Other standards have noted this problem. For instance, UBL (an OASIS
standard for transactional business data) formally separates the validation
of controlled values (using Schematron) from the validation of the document
types (using XML Schema).
The solution: Provide DITA adopters
with a method for defining controlled values as a stable, highly controlled
portion of their content.
DITA provides maps for defining collections.
Thus, a specialized map offers a natural DITA idiom for defining a collection
of controlled values. A quick summary of the solution:
- Define a controlled value as a key in a specialized DITA map.
- Use the specialized map to organized controlled values in a list or hierarchy
for an attribute.
- In a content map or topic, use the controlled value in the attribute to
apply the controlled value to the content.
- Where useful, also use the specialized map to define controlled values
in prose topic.
- Where useful, use the specialized map to express relationships between
controlled values.
Technical Requirements
Defining an enumeration: Fundamentally,
a controlled value is a short, readable, and meaningful identifier for a subject.
Such identifiers are a good match for DITA 1.2 proposal for keys. That is,
the minimum definition of a controlled value could consist of the definition
for a key as part of the enumeration for a category. The following example
defines a flat list of controlled values within the operating system category,
introducing a <subjectdef> element (specialized from <topicref>)
to distinguish the identified thing as a subject rather than a topic content
object:
<subjectScheme>
<subjectdef keys="os">
<subjectdef keys="linux"/>
<subjectdef keys="mswin"/>
<subjectdef keys="zos"/>
</subjectdef>
...
</subjectScheme>
For clarity and maintainability, a content provider can
supply a navtitle attribute for each value. Tools can display the title
to users while using the key for tagging content. The title can change without
invalidating existing tagging.
<subjectScheme>
<subjectdef keys="os" navtitle="Operating system">
<subjectdef keys="linux" navtitle="Linux"/>
<subjectdef keys="mswin" navtitle="Windows"/>
<subjectdef keys="zos" navtitle="z/OS"/>
</subjectdef>
...
</subjectScheme>
The enumeration can be defined with hierarchical levels.
The following example defines RedHat and SuSE as special kinds of Linux. Tools
such as filtering and flagging processes can match content tagged with child
values when a parent value is specified.
<subjectScheme>
<subjectdef keys="os" navtitle="Operating system">
<subjectdef keys="linux" navtitle="Linux"/>
<subjectdef keys="redhat" navtitle="RedHat Linux"/>
<subjectdef keys="suse" navtitle="SuSE Linux"/>
</subjectdef>
<subjectdef keys="mswin" navtitle="Windows"/>
<subjectdef keys="zos" navtitle="z/OS"/>
</subjectdef>
...
</subjectScheme>
As with <topicref>, properties of a <subject>
defined in different maps should aggregate. This principle applies to relationships.
Thus, an existing enumeration can be extended out of line by a different map
attaching new values as children of an existing value. The extension can
identify the parent value by key. For instance, a different map can add a
Macintosh subject as a top-level value in the operating system category and
add child subjects under the Windows subject.
<subjectScheme>
<schemeref href="base_os.ditamap"/>
<subjectdef keyref="os">
<subjectdef keys="macos" navtitle="Macintosh"/>
<subjectdef keyref="mswin">
<subjectdef keys="win98" navtitle="Windows 98"/>
<subjectdef keys="winxp" navtitle="Windows XP"/>
</subjectdef>
</subjectdef>
...
</subjectScheme>
A category can be extended upward. For instance, a content
provider might create a Software category that includes operating systems.
<subjectScheme>
<schemeref href="base_os.ditamap"/>
<subjectdef keyref="sw" navtitle="Software">
<subjectdef keys="os"/>
<subjectdef keyref="app" navtitle="Applications">
<subjectdef keys="apacheserv" navtitle="Apache Web Server"/>
<subjectdef keys="mysql" navtitle="MySQL Database"/>
</subjectdef>
</subjectdef>
...
</subjectScheme>
When sharing controlled values, content teams must apply
the same interpretation to each value. Otherwise, the value will associate
dissimilar content. For instance, if one content team tags regards UNIX
as including Linux while another regards Linux and UNIX and exclusive, a definition
of the meaning of their values will help the two teams discover and accomodate
the discrepancy. (The second team will need to define a new parent subject
for their existing Linux and UNIX subjects and equate that parent subject
with the other team's UNIX subject.)
To define a controlled value,
a content team can supply a definitional topic (similar to an entry in an
encyclopaedia or glossary) at any time:
<subjectScheme>
<subjectdef keys="os" navtitle="Operating system">
<subjectdef keys="linux" navtitle="Linux" href="subject/linux.dita"/>
<subjectdef keys="mswin" navtitle="Windows"/>
<subjectdef keys="unix" navtitle="UNIX" href="subject/unix.dita"/>
<subjectdef keys="zos" navtitle="z/OS"/>
</subjectdef>
...
</subjectScheme>
<concept id="linux">
<title>The Linux operating system</title>
<body>
<p>Although Linux has historical roots in UNIX, ...</p>
</body>
</concept>
<concept id="unix">
<title>The UNIX operating system</title>
<body>
<p>As a commercial operating system, UNIX differs from Linux ...</p>
</body>
</concept>
In fact, when a subject is defined with a key but not an
href, the key can be thought of as an identifier for a virtual definitional
topic that isn't needed for a well-known subject but could be provided later.
By
organizing controlled values in a subsuming hierarchy and defining each controlled
value precisely in a topic, a content provider is in fact creating a formal
taxonomy. The specialized map can define more precise hierarchical or associative
relationships between subjects.
<subjectScheme>
<subjectdef keys="mswin" navtitle="Windows">
<hasPart>
<subjectdef keys="iexplorer" navtitle="Internet Explorer Browser"/>
...
</hasPart>
...
</subjectdef>
<relatedSubjects>
<subjectdef keys="linux" navtitle="Linux"/>
<subjectdef keys="mysql" navtitle="MySQL Database"/>
...
</relatedSubjects>
...
</subjectScheme>
While available, such formality isn't required.
The
existing DITA taxonomy specialization (available as a plugin for the DITA
Open Toolkit) provides a precedent for defining subjects in this way.
Binding
a value category to an attribute: The specialized map can specify that
an attribute's values should be limited to one or more categories. The following
example uses specialized elements to associate the platform attribute with
the operating system category:
<valuedef type="keys">
<enumeration type="attribute" name="platform"/>
<subjectdef keyref="os"/>
</valuedef>
The specialized map defining the controlled values and
their binding to attributes can be registered with tools or processes using
tool-specific mechanisms (for instance, using catalogs).
Tagging
content with values: After controlled values have been bound to an attribute
and registered with tools, tools can validate the attribute. For instance,
an editor could prevent the user from entering "linix" as a platform
value or provide a pick list offering the titles of the Operating System
subjects for selection by the user.
<note platform="linux">Don't remove the root directory.</note>
Some content providers won't want to define new attributes
for categories of controlled values. Such content provider can indicate the
applicability of the subject for filtering and flagging a topic with a specialized <topicapply>
element.
<map>
...
<topicref href="troubleshootingLamp.dita">
<topicapply keyref="linux"/>
...
</topicref>
...
</map>
Multiple values can be listed within the <topicapply>
element:
<map>
...
<topicref href="troubleshootingLamp.dita">
<topicapply>
<subjectref keyref="linux"/>
<subjectref keyref="apacheserv"/>
<subjectref keyref="mysql"/>
<subjectref keyref="perl"/>
</topicapply>
...
</topicref>
...
</map>
Where a controlled value has a definitional topic, a reference
to the definitional topic can be used instead of the key. (Anything else
would be inconsistent with the general behavior of keys.)
<map>
...
<topicref href="troubleshootingLamp.dita">
<topicapply href="subject/linux.dita"/>
...
</topicref>
...
</map>
Finally, to distinguish content that is truely about a
subject and thus appropriate for retrieval (as opposed to merely applicable
to a subject and thus appropriate for filtering and flagging), the content
provider can use a specialized <topicsubject> element:
<map>
...
<topicref href="linuxCapabilities.dita">
<topicsubject keyref="linux"/>
...
</topicref>
...
</map>
Note: For consistency, references to definitional topics might
be accepted as synonyms for key values in the DITA values file.