dita message

Subject: Re: [dita] Namespace resolution

From: Erik Hennum <ehennum@us.ibm.com>
To: ekimber@innodata-isogen.com
Date: Tue, 10 Aug 2004 19:56:52 -0700

Hi, Eliot and TC Members:

I'd submit that DITA has a distinction between document type shells and specialization packages in the architecture and not just in the syntax.

By themselves, specialization packages are pluggable design building blocks. The document type shell realizes a complete design by assembling a set of specialization packages and by controlling the nesting of topic types. That is, the document type shell -- and not a specialization package -- defines a complete vocabulary.

The association is analgous to the one between topics and maps but on the design level. Subsets of same pool of topics can be organized in different deliverables by maps. Similarly, subsets of the pool of specialization packages can be assembled into different complete vocabularies by document type shells.

In DTD syntax, the document type shell is expressed as a *.dtd file and the specialization package is expressed as a *.mod file (or for a domain as a pair of *.ent and *.mod files).

For instance, the current DITA distribution provides both a concept specialization package ("concept.mod") and a concept document type shell ("concept.dtd"). The concept document type shell combines the concept package with the highlight, programming, software, UI, and utility domain packages. The concept package could appear in other document type shells with other combinations of domains or nested topic specializations. That is, despite the shared names, the concept package has no stronger association with the document type shell expressed in "concept.dtd" than with any other shell that uses concept as the root element (aka the document element).

An application needs to recognize the DITA document type shell to process a complete DITA vocabulary with full semantics. For instance, if a dispatcher looks only at the type of the root element:

<concept xmlns="http://dita.oasis-open.org/1.0/package/concept">...

it would invoke a content handler that understands the concept package elements:

<concept id="container">
<title>...</title>
...
<concept id="contained">...</concept>
</concept>

but that content handler could easily encounter a very different set of content elements:

<concept id="container">
<domainTitle>...</domainTitle>
...
<specializedConcept id="contained">...</specializedConcept>
</concept>

because the document type shell assembles the concept package with the someDomain and specializedConcept packages. A Content handler that doesn't understand the vocabulary for this document type shell will have to generalize if it understands the DITA architecture or give up if doesn't.

Having each element identify its precise and invariant type by way of the specialization package is critical. Identifying the complete vocabulary by way of the document type shell complements that precision. Ideally, the identifier for the shell document type would be bound to the document as a whole. In fact, in DTD syntax, DITA has that now because DITA declares separate public identifiers for the document type shell and the specialization packages assembled by the shell.

For the root element, the preferred approach might be to bind two namespaces: one for the specialization package that provided the element and one for the document type shell that assembled the complete vocabulary, as in:

<concept
xmlns="http://dita.oasis-open.org/1.0/package/concept"
xmlns="http://dita.oasis-open.org/1.0/shell/concept"
>...

TopicMaps or RDF could do that by explicitly stating the role of each namespace. Unfortunately, in XML alone, an element can only be bound to one namespace. In Schema, if elements are bound to the namespace for the specialization package, the package namespace on the root element will be the namespace for the document, and the document type shell won't be identified.

Long-winded, but that's the reasoning behind:

> > 5. DITA adopters may use the namespace for the shell document type to > > identify a document during authoring or processing but must recognize that > > the shell namespace does not identify the type of the document element.
A footnote to the forgoing: the "dita" element (which provides a container for a list of topics) has the interesting property that it isn't, in itself, typed. So, we _could_ attach the namespace for the document type shell to the dita element without displacing a specialization package namespace.

That is, if people need to use a vocabulary namespace to author or process a topic, the most straightforward workaround _might_ be to wrap the topic in a dita element. That way, in the future, an application seeing

<dita xmlns="http://some.org/dita/enhancedConceptVocabulary">
<concept xmlns="http://dita.oasis-open.org/1.0/package/concept">...

could be confident about both the complete vocabulary and the precise type of the concept element.

> I would state this more strongly: package identifiers *are* namespace > identifiers. That is, for every package identifier this is/must be > exactly one corresponding namespace URI declared within the scope of the > use of the package identifier (that is, on the element or one of its > ancestors).
In principle, I agree strongly. In practice, my concern is that, to implement this approach, we have to solve problems like swapping namespaces in and out of the class attribute during generalization and respecialization.

With Paul's clock ticking in my ear, I'm thinking that our initial requirements should be conservative while we work out a stable, tested solution for DITA inheritance and pluggability with namespaces.

On the class attribute question, I'm sorry that I misunderstood before. Can you expand on the benefits of attaching a namespace to the class attribute itself? I see the importance of attaching namespaces to elements whether manifest in the document or latent within the value of the class attribute. I would have thought that, like other attributes, the class attribute itself should be in the same namespace as the element containing it. That would seem less complex for authors and processes.

With interest,

Erik Hennum
ehennum@us.ibm.com

Eliot Kimber <ekimber@innodata-isogen.com> wrote on 08/10/2004 09:05:24 AM: > Erik Hennum wrote: > > > As a possible lead-in to discussion of namespaces (slippery little devils), > > I wanted to summarize my understanding of the resolution that Eliot > > discovered. Please correct as needed: > > 0. The TC will define a distinct namespace for the dita class attribute > (and, possibly, other universal DITA-specific attributes, such as ID). > All DITA documents will use this namespace to qualify all instances of > the class attribute. No other namespaces need be used for DITA documents > (but may be if desired). > > > 1. The TC will define a distinct namespace for each specialization package > > and document type shell in OASIS DITA (see the candidate list below). > > I don't see a need to have a namespace for the document type shells (the > declaration sets or schema documents). That is, there is no real > distinction between a specialization package and the DTD declarations > that define the syntax rules for the types in that package. We may want > to have normative URIs for the declaration set files, but those URIs > would *not* be namespace URIs, they would just be persistent resource URIs. > > Or said another way: what's important is the abstraction that is the > *idea* of the specialization package, which is what the namespace > identifies. The specific declaration set or sets that are an > implementation projection of the package are more or less arbitrary and > there may be any number of them for different practical purposes. > > > 2. In the first release, the DITA DTD and Schema document types will _not_ > > be distributed with namespace attributes that declare these namespaces > > directly. > > Except for the "dita base" namespace defined in item zero above, which > qualifies the class attribute--that must be declared in all DTDs, > schemas, and schema-less document instances. > > > 3. Specializers are encouraged to define a distinct namespace for each > > specialization package and document type shell that they create. > > I think this needs to be a hard requirement: c/are encouraged/must/ > > > 4. Processors are encouraged to treat specialization package identifiers > > and specialization package namespaces as interchangeable identifiers. > > I would state this more strongly: package identifiers *are* namespace > identifiers. That is, for every package identifier this is/must be > exactly one corresponding namespace URI declared within the scope of the > use of the package identifier (that is, on the element or one of its > ancestors). > > > 5. DITA adopters may use the namespace for the shell document type to > > identify a document during authoring or processing but must recognize that > > the shell namespace does not identify the type of the document element. > > I'm not sure I understand this statement. In particular, I'm not sure > what you mean by "document type": > > - The name of the root element of an instance? > > - The system identifier of the applicable declaration set? > > - The governing abstract schema? > > The namespace of the specialization package should be sufficient to > identify the *abstract* document type, which is sufficient to condition > type-based processing. > > Remember that syntactic validation by DTD declarations or schemas is > largely orthogonal to the task of mapping element types to processing, > so how one finds the appropriate declarations is essentially up to the > user. > > When DTDs are used there's no standards-defined relationship between > namespaces and DTDs in the way there is for schemas, so for DTDs as long > as the element types defined in the DTD are consistent with the rules > defined by the abstract name space then everything's ok. If people feel > a need for definitive URIs or public IDs for the DTD declarations, I > wouldn't object, but those URIs would *not* be namespace URIs, they > would be URIs to concrete resources. > > For schemas, where there is both a standards-defined relationship > between namespaces and schemas (targetNamespace/schemaLocation) I assert > that good practice is to have exactly one top-level schema document for > a given namespace. Therefore, my expectation would be that for each > package namespace there would be exactly one corresponding top-level > schema document, which I think is what Erik is referring to as the shell > document types. > > That is, you don't need a separate namespace just to refer to the > schemas--the targetNamespace and schemaLocation attributes would serve > to bind the package namespaces to the shell schema documents. > > > Because a topic type can be used in multiple shells (for instance, shells > > assembling different combinations of domains), the same DITA topic element > > could be the document element in several different document types. If the > > shell namespaces are declared in these documents, a topic element could > > belong to several different namespaces. The best solution for this problem > > would be to attach the namespace for the shell document type to something > > other than a specialization package element. Until we have that solution, > > people need to be aware of the ambiguity. > > I'm not sure I understand this comment. A given element instance can > only be in one namespace and the namespace it is in is entirely a > function of its namespace declaration, not the declaration set that > describes it. > > That is, as long as all element type names are fully qualified there can > be no ambiguity regardless of how element types are combined in a single > document instance. > > Or maybe you're thinking of the implication of using default namespaces? > I think the answer has to be: namespaces have to either be explicit or > the default namespace has to be reset on any element that is the root of > a subtree from a package different from its parent's package. Of course, > this is only required if you are choosing to qualify element type names > at all, which isn't ever necessary except to disambiguate name clashes > between non-DITA-defined specialization packages. So if you rely > entirely on class mappings and don't worry about namespace qualification > of element type names, there's no worries at all. > > That is, if all processing is based on class mappings, the namespace > qualification of element type names is either arbitrary or irrelevant, > depending on the task at hand. > > > 6. A future release of DITA may integrate namespaces more directly into > > the DITA type classification system. This change alone should not result > > in any changes to element structures or local element names but could > > change the way namespaces are used on the document element. > > Exactly. > > > Here follows a list of candidate namespaces in the following formats > > proposed by Eric Sirois: > > > > http://dita.oasis-open.org/{version}/shell/{documentTypeBasename} > > > > http://dita.oasis-open.org/{version} > /package/{specializationPackageIdentifier} > > > > The shell document type namespaces (equivalent to *.dtd and *.xsd) might be > > as follows: > > > > http://dita.oasis-open.org/1.0/shell/topic > > http://dita.oasis-open.org/1.0/shell/concept > > http://dita.oasis-open.org/1.0/shell/reference > > http://dita.oasis-open.org/1.0/shell/task > > > > http://dita.oasis-open.org/1.0/shell/ditabase > > > > http://dita.oasis-open.org/1.0/shell/map > > To re-iterate, I don't see the need for separate namespaces for the > declaration sets themselves (see discussion above). > > > The specialization package namespaces (equivalent to *.mod and the > > qualifiers on the class attribute) might be as follows: > > > > http://dita.oasis-open.org/1.0/package/topic > > http://dita.oasis-open.org/1.0/package/concept > > http://dita.oasis-open.org/1.0/package/reference > > http://dita.oasis-open.org/1.0/package/task > > > > http://dita.oasis-open.org/1.0/package/hi-d > > http://dita.oasis-open.org/1.0/package/pr-d > > http://dita.oasis-open.org/1.0/package/sw-d > > http://dita.oasis-open.org/1.0/package/ui-d > > http://dita.oasis-open.org/1.0/package/ut-d > > > > http://dita.oasis-open.org/1.0/package/map > > http://dita.oasis-open.org/1.0/package/mapgroup-d > > These all seem fine to me. > > Cheers, > > E. > -- > W. Eliot Kimber > Professional Services > Innodata Isogen > 9390 Research Blvd, #410 > Austin, TX 78759 > (512) 372-8122 > > eliot@innodata-isogen.com > www.innodata-isogen.com

References:
- Re: [dita] Namespace resolution
  - From: Eliot Kimber <ekimber@innodata-isogen.com>