dita message

Subject: Re: [dita] Namespace resolution
From: Eliot Kimber <ekimber@innodata-isogen.com>
To: DITA TC list <dita@lists.oasis-open.org>
Date: Thu, 19 Aug 2004 13:27:31 -0500
Erik Hennum wrote:

> An application needs to recognize the DITA document type shell to process a
> complete DITA vocabulary with full semantics.  For instance, if a
> dispatcher looks only at the type of the root element:

I think I understand what Erik is getting at here: A given *combination* 
of specialization packages represents a unique XML *application* and 
therefore needs an identifier of its own.

I think I agree with that. What I don't agree with is that any external 
identifier of any external entity (including external declaration sets) 
*is that identifier*.

That is, the XML world currently only gives us one way to unambiguously 
bind documents to their vocabularies and that is namespaces.

Given that then I agree that *if* a document needs to be recognized as 
being governed by a particular XML application then that application 
should have an associated namespace and the document should declare it.

Before we go farther I would like to suggest a terminology set that 
might help us to be clearer in this discussion as some terms, such as 
"document type", are far too overloaded to be useful at this point.

Here are my suggested terms:

- *Specialization package*

   A set of element types and supporting components that use the 
DITA-defined class mechanism to specialize elements from another, more 
general, package or packages.

   Specialization packages can generally be characterized as "topic" 
specializations" or "domain" specializations.

   "Topic" specializations directly or indirectly specialize the base 
DITA "topic" element type and its subelements. Topic specializations are 
intended to govern compound documents that contain complete topics, that 
is documents that can be meaningfully processed in isolation as a single 
unit of delivery.

   "Domain" specializations specialize elements below the "topic" level, 
often specializing elements that occur within a paragraph context (e.g., 
elements that identify mentions of names of domain-specific objects such 
as user-interface components, object class names, method names, part 
numbers, etc.). Domain specializations are not usually intended to 
govern entire compound documents[Such elements might be the roots of 
documents that are conreffed into larger compound documents but that 
would not normally be meaningfully processed in isolation.]

   A complete specialization package includes the following components:

   - A prose definition of the abstract (implementation-independent) 
element types in the vocabulary, including applicable business rules 
that govern the types' creation, management, and processing. (E.g., a 
"data dictionary" for the package.)

   - A DTD- or schema-based set of syntax rules for the element types, 
attributes, and content defined in the package. These definitions 
should, as much as possible, be designed and implemented to enable easy 
integration with larger XML applications.

   - Processor components that implement specific business rules in the 
service of specific business processes (e.g., authoring, management, 
rendition, interchange, etc.). For example, a specialization package 
might include XSLT modules or Java classes that implement core behaviors 
of the package's element types.


- *core DITA packages*

   The specialization packages normatively defined by the DITA 
specification, i.e., topic, task, concept, reference, and the domain 
packages. These specialization packages are "well known".

- *user-defined specialization package*

   A specialization package not defined by the DITA specification.

- *DITA application*

   A complete ("encompassing") vocabulary composed from one or more 
DITA-based specialization packages such that every element type in the 
application is a direct or indirect specialization of one of the core 
DITA-defined element types.  For example, you might combine the 
"concept" topic specialization module with just the "ui-domain" domain 
specialization package to create an application for documents that 
describe user interfaces.

   A complete application definition includes:

   - DTD- or schema-based syntax rules for XML element types, 
attributes, and content.

   - Prose definitions of the business rules that govern the creation, 
management, processing, and delivery of documents governed by the 
application

   - Processors that implement the business rules in the service of 
specific business processes (e.g., authoring, management, rendition, 
interchange, etc.)

   For DITA applications, many of these components may be shared. For 
example, the DITA-provided schema rules for a given package can be used 
directly by the schema for a particular application. Likewise, the 
application's documentation can refer to the DITA-provided documentation 
to form a complete set of prose definitions.

- *core DITA applications*

   The applications defined normatively by the DITA specification.

- *DITA-using XML application*

   A vocabulary that includes elements that are specializations of DITA 
elements using the DITA class mechanism but that includes element types 
that are not themselves specializations of core DITA types. For example, 
you might create an XML application that uses just the DITA syntax 
diagram package.

   DITA-using applications have the same definitional components as DITA 
applications and may likewise re-use shared or shareable components.

- *DITA document*

   An XML document instance that uses the DITA-defined class mechanism 
to bind itself to one or more specialization packages. A DITA document 
may be governed by either a DITA application or a DITA-using XML 
application.

- *DITA-only document*

   An XML document instance that is governed by a DITA application.

- *document type (XML)*

   The element type of the root element of a document instance. Let us 
agree to *never* use the term "document type" when we mean *XML 
application" or "XML vocabulary".

Given this terminology then I think the namespace requirements can be 
stated as:

1. Each specialization package has an associated unique namespace.

2. Each DITA application has an associated unique namespace. 
Essentially, each distinct combination of packages represents a unique 
vocabulary. For well-known packages there should be exactly one 
namespace for each possible useful combination of packages.

3. In order to allow the generic processing of DITA documents the 
DITA-defined class attribute should be qualified with it's own 
namespace. This allows processors to reliably identify DITA documents 
without having any further knowledge of any DITA packages or 
applications. It also ensures that the name of the DITA-defined class 
attribute will be invariant across all possible DITA applications. Let 
us call this the "DITA base namespace".

Given these rules and remembering that for the purposes of mapping 
element *instances* to processing all that is required is the class 
attribute value, I think all of the following will be possible in a 1.0 
time frame:

1. Legacy DTDs and documents in which the only change is the 
qualification of the class attribute to use the DITA base namespace. 
Existing DITA-aware processors would need to be updated to recognize the 
qualified form of the class attribute but we've already demonstrated 
that this is easy to do and can be done in a way that allows the 
processing of both qualified and unqualified class attributes.

2. Documents that use only the DITA application namespaces to identify 
the DITA application that governs them. Note that this is independent of 
whether or not the document is DTD- or schema-based. As long as this 
namespace is the default namespace, element instances need not change.

3. DTD-less and schema-less documents that are both recongizable as DITA 
documents and, if all class attributes are made explicit on elements, 
processible by normal class-based DITA processors.

4. DITA applications in which all element type names are in the same 
namespace (the application namespace) regardless of which package those 
elements are drawn from. This is possible because all DITA-specific 
processing should be conditioned on the class attribute values, so 
element type names are essentially arbitrary.

5. DITA applications in which element type names are qualified with 
their corresponding package namespaces. This is possible for the same 
reason (4) is possible: element type names are arbitrary.

This approach would mean the following, I think:

1. The DITA 1.0 spec does not have to *require* the use of namespaces 
for anything but the class attribute and the existing DITA-provided DTDs 
and schemas need only be changed to add this qualification--there is no 
need for them to also qualify element type names or package names with 
application or specialization namespaces. That is, the current 
IBM-submitted DTDs and schemas can be used essentially as-is in 1.0.

This means that conforming DITA documents need not have any namespaces 
for the element type names, reflecting current IBM DITA practice.

2. The namespace prefixes for the core DITA packages are "magic" and 
must be use used as-is in class attribute values in DITA 1.0. This 
avoids any requirement for DITA 1.0 processors to have to be prepared to 
dereference core package names to namespace URIs.

3. The DITA 1.0 spec can *discuss* the other ways in which namespaces 
_can_ be used in conforming DITA applications without actually doing it 
requiuring it or doing it in the oasis-provided DTDs and schemas.

4. DITA-aware tools can use the DITA base namespace to reliably and 
unambiguously recognize DITA documents and to distinguish DITA documents 
from non-DITA documents. For example, the requirements of my XIRUSS-T 
content management tool are met.

5. Tools that condition processing based on the *DITA application* being 
used can either continue to use external identifiers to indicate the 
application (the current mechanism) or can require the use of 
application-specific namespaces if they so choose (for example, both 
XIRUSS-T and Microsoft Word would have to require the use of application 
namespaces as neither uses DOCTYPE declarations in any way).

Note that this enables the use of the existing DTD and schema files as 
provided but does not require their use--the ultimate test of DITA 
conformance must still be architectural validation. If someone wants to 
have namespace-qualified element types they'll need to create their own 
versions of the DTDs or schemas that add the appropriate namespace 
declarations and qualifications. There might be a way to enable this in 
the DTDs using parameter entities but I'm not sure it's worth the effort 
to do it.

6. User-defined specialization packages *must* be namespace qualified 
and DITA processors should expect to have to dereference non-core 
package names used in class attributes to namespace URIs. I don't see a 
away around this as the alternative is to accept the potential for 
unresolvable package name collision in class attribute values. I don't 
think this is a hardship in practice. It does suggest that perhaps there 
are at least two levels of conformance for DITA processors: those that 
only recognize core DITA packages and those that can handle all packages.

Because the class attribute values would still use the same syntax they 
do today, existing techniques for matching class values would still 
work, it would just be up to document authors or DITA application 
builders to ensure that all package names are unique and consistent (for 
example, to enable reliable binding of CSS styles to class attribute 
values).

>>I would state this more strongly: package identifiers *are* namespace
>>identifiers. That is, for every package identifier this is/must be
>>exactly one corresponding namespace URI declared within the scope of the
>>use of the package identifier (that is, on the element or one of its
>>ancestors).
> 
> 
> In principle, I agree strongly.  In practice, my concern is that, to
> implement this approach, we have to solve problems like swapping namespaces
> in and out of the class attribute during generalization and
> respecialization.

I'm not sure I understand this comment: the value of the class attribute 
is (conceptually) just a list of namespace prefixes that map to the URIs 
for packages. The class attribute value need never change. If we ignore 
schema-less documents for the moment, then at worst it would require a 
DITA application implementor to update the *fixed* values of class 
attriutes in order to disambiguate the prefixes of two user-defined 
specialization packages that happen to use the same prefix value by 
default. But this is a very small cost and only affects the application 
implementor--it would not affect authors or processor implementors (at 
least for processors that can dereference package prefixes to namespace 
URIs).

For schema-less documents, where all the class attributes would be 
explicit on each element, you would have to rewrite all the class values 
in order to use that document with an updated DITA application, but even 
that can be handled through a trivial transform and is probably a rare 
case in any event (the use of schemaless documents not being 
particularly good practice in most use cases).

> On the class attribute question, I'm sorry that I misunderstood before.
> Can you expand on the benefits of attaching a namespace to the class
> attribute itself?  I see the importance of attaching namespaces to elements
> whether manifest in the document or latent within the value of the class
> attribute.  I would have thought that, like other attributes, the class
> attribute itself should be in the same namespace as the element containing
> it. That would seem less complex for authors and processes.

I think you've got it exactly backward: In a DITA context, there is *no 
particular value* to having element type names in any particular 
namespace precisely because they are identified, for processing 
purposes, entirely by the values of the class attributes.

This is independent of authoring concerns (user interface)--in that 
context element type names are important but since they are also 
arbitrary with respect to DITA processing they essentially factor out of 
this discussion.

I also realized that I'm making an assumption that might not be 
universal: the value of the class attribute fully qualifies the element 
type. That is, if the element type name is "concept" then the class 
attribute value is " topic/topic concept/concept " and if the element 
type is a further specialization of "concept", "myconcept", then the 
class attribute value is

   " topic/topic concept/concept mypackage/myconcept".

As long as this is always the case then the element type name is simply 
irrelevant for the purpose of DITA-based processing. That is, from a 
DITA perspective, the element type name is, by definition, a synonym for 
the element's class name. For example, consider this declaration set 
intended to create the smallest possible document instances:

<!ELEMENT a (b, c+) >
<!ATTLIST a
    id
      NMTOKEN
      #REQUIRED
    ditabase:class
      CDATA
      FIXED " topic/topic concept/concept "
    xmlns:ditabase
      CDATA
      #FIXED "http://dita.oasis-open.org/1.0/DITA base"
 >

<!ELEMENT b
    ditabase:class CDATA FIXED " topic/title "
 >

<!ELEMENT c
    ditabase:class CDATA FIXED " topic/p "
 >

This might not be easy to author but it's 100% understandable by a 
DITA-aware processor and is a completely valid DITA instance (assuming 
I've actually declared the elements correctly which I haven't bothered 
to check).

The class attribute is the one invariant in the system that enables all 
other processing. Therefore it needs to be in its own, independent, 
namespace so that processors can always find it reliably.

One way to think about this is to identify the layers of processing and 
what information you need to reliably map data to processing at each level.

Starting with an input XML document a processor has to ask the following 
questions:

1. Is this document a DITA document? (That is, does it use the 
DITA-defined class mechanism to bind its elements and attributes to the 
DITA element types and attributs?

   This question can be answered by looking for the DITA base namespace 
(which qualifies the DITA-specific class attribute). If this namespace 
declaration is found somewhere in the document then the document *must* 
be a DITA document. If this namespace declaration is not found then the 
document *cannot be* a DITA document.

   Answer: no -> Go to question 3

   Answer: yes -> Document is a DITA document, proceed to question 2:

2. Is this DITA document governed by a DITA application I recognize?

    This question can be answered unambiguously by looking for namespace 
declarations that name known DITA application namespace URIs on the root 
element. It can be answered with reasonable (but not 100%) certainty by 
looking at the external identifier of the document's DOCTYPE declaration 
or non-namespace schema or by taking the user's word that this is in 
fact a DITA document governed by a particular application [this is the 
implication when you apply an application-specific XSLT to a document 
for example or when you work in an environment that only supports one 
XML application.]

    Answer: no -> Go to question 3

    Answer: yes -> Document is a DITA-only document. Apply processing 
for the application(s) that apply to the document.

3. Does this document conform to a non-DITA application I recognize?

    This question is answered as for question (2): look for known 
namespaces, external DTD subsets, schema instances, or take the user's 
word for it.

    Answer: no: -> Apply DITA-specific processing based on class 
attribute values.

    Answer: yes -> Apply application-specific processing (which may 
itself apply DITA-specific processing if the application happens to be 
DITA-aware).

Notice that a *no point* in this chain of questions does the namespace 
qualification of element types come into play. All that is important is 
the declaration or non-declaration of namespace URIs--we absolutely 
don't care how those namespaces are subsequently used on elements.

The only point at which namespace qualification could become an issue is 
in application-specific processing of elements that is not done based on 
their DITA class values.

Cheers,

E.
-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122

eliot@innodata-isogen.com
www.innodata-isogen.com
Follow-Ups:
- Re: [dita] Namespace resolution
  - From: Erik Hennum <ehennum@us.ibm.com>