[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [dita] Namespace resolution
Erik Hennum wrote: > An application needs to recognize the DITA document type shell to process a > complete DITA vocabulary with full semantics. For instance, if a > dispatcher looks only at the type of the root element: I think I understand what Erik is getting at here: A given *combination* of specialization packages represents a unique XML *application* and therefore needs an identifier of its own. I think I agree with that. What I don't agree with is that any external identifier of any external entity (including external declaration sets) *is that identifier*. That is, the XML world currently only gives us one way to unambiguously bind documents to their vocabularies and that is namespaces. Given that then I agree that *if* a document needs to be recognized as being governed by a particular XML application then that application should have an associated namespace and the document should declare it. Before we go farther I would like to suggest a terminology set that might help us to be clearer in this discussion as some terms, such as "document type", are far too overloaded to be useful at this point. Here are my suggested terms: - *Specialization package* A set of element types and supporting components that use the DITA-defined class mechanism to specialize elements from another, more general, package or packages. Specialization packages can generally be characterized as "topic" specializations" or "domain" specializations. "Topic" specializations directly or indirectly specialize the base DITA "topic" element type and its subelements. Topic specializations are intended to govern compound documents that contain complete topics, that is documents that can be meaningfully processed in isolation as a single unit of delivery. "Domain" specializations specialize elements below the "topic" level, often specializing elements that occur within a paragraph context (e.g., elements that identify mentions of names of domain-specific objects such as user-interface components, object class names, method names, part numbers, etc.). Domain specializations are not usually intended to govern entire compound documents[Such elements might be the roots of documents that are conreffed into larger compound documents but that would not normally be meaningfully processed in isolation.] A complete specialization package includes the following components: - A prose definition of the abstract (implementation-independent) element types in the vocabulary, including applicable business rules that govern the types' creation, management, and processing. (E.g., a "data dictionary" for the package.) - A DTD- or schema-based set of syntax rules for the element types, attributes, and content defined in the package. These definitions should, as much as possible, be designed and implemented to enable easy integration with larger XML applications. - Processor components that implement specific business rules in the service of specific business processes (e.g., authoring, management, rendition, interchange, etc.). For example, a specialization package might include XSLT modules or Java classes that implement core behaviors of the package's element types. - *core DITA packages* The specialization packages normatively defined by the DITA specification, i.e., topic, task, concept, reference, and the domain packages. These specialization packages are "well known". - *user-defined specialization package* A specialization package not defined by the DITA specification. - *DITA application* A complete ("encompassing") vocabulary composed from one or more DITA-based specialization packages such that every element type in the application is a direct or indirect specialization of one of the core DITA-defined element types. For example, you might combine the "concept" topic specialization module with just the "ui-domain" domain specialization package to create an application for documents that describe user interfaces. A complete application definition includes: - DTD- or schema-based syntax rules for XML element types, attributes, and content. - Prose definitions of the business rules that govern the creation, management, processing, and delivery of documents governed by the application - Processors that implement the business rules in the service of specific business processes (e.g., authoring, management, rendition, interchange, etc.) For DITA applications, many of these components may be shared. For example, the DITA-provided schema rules for a given package can be used directly by the schema for a particular application. Likewise, the application's documentation can refer to the DITA-provided documentation to form a complete set of prose definitions. - *core DITA applications* The applications defined normatively by the DITA specification. - *DITA-using XML application* A vocabulary that includes elements that are specializations of DITA elements using the DITA class mechanism but that includes element types that are not themselves specializations of core DITA types. For example, you might create an XML application that uses just the DITA syntax diagram package. DITA-using applications have the same definitional components as DITA applications and may likewise re-use shared or shareable components. - *DITA document* An XML document instance that uses the DITA-defined class mechanism to bind itself to one or more specialization packages. A DITA document may be governed by either a DITA application or a DITA-using XML application. - *DITA-only document* An XML document instance that is governed by a DITA application. - *document type (XML)* The element type of the root element of a document instance. Let us agree to *never* use the term "document type" when we mean *XML application" or "XML vocabulary". Given this terminology then I think the namespace requirements can be stated as: 1. Each specialization package has an associated unique namespace. 2. Each DITA application has an associated unique namespace. Essentially, each distinct combination of packages represents a unique vocabulary. For well-known packages there should be exactly one namespace for each possible useful combination of packages. 3. In order to allow the generic processing of DITA documents the DITA-defined class attribute should be qualified with it's own namespace. This allows processors to reliably identify DITA documents without having any further knowledge of any DITA packages or applications. It also ensures that the name of the DITA-defined class attribute will be invariant across all possible DITA applications. Let us call this the "DITA base namespace". Given these rules and remembering that for the purposes of mapping element *instances* to processing all that is required is the class attribute value, I think all of the following will be possible in a 1.0 time frame: 1. Legacy DTDs and documents in which the only change is the qualification of the class attribute to use the DITA base namespace. Existing DITA-aware processors would need to be updated to recognize the qualified form of the class attribute but we've already demonstrated that this is easy to do and can be done in a way that allows the processing of both qualified and unqualified class attributes. 2. Documents that use only the DITA application namespaces to identify the DITA application that governs them. Note that this is independent of whether or not the document is DTD- or schema-based. As long as this namespace is the default namespace, element instances need not change. 3. DTD-less and schema-less documents that are both recongizable as DITA documents and, if all class attributes are made explicit on elements, processible by normal class-based DITA processors. 4. DITA applications in which all element type names are in the same namespace (the application namespace) regardless of which package those elements are drawn from. This is possible because all DITA-specific processing should be conditioned on the class attribute values, so element type names are essentially arbitrary. 5. DITA applications in which element type names are qualified with their corresponding package namespaces. This is possible for the same reason (4) is possible: element type names are arbitrary. This approach would mean the following, I think: 1. The DITA 1.0 spec does not have to *require* the use of namespaces for anything but the class attribute and the existing DITA-provided DTDs and schemas need only be changed to add this qualification--there is no need for them to also qualify element type names or package names with application or specialization namespaces. That is, the current IBM-submitted DTDs and schemas can be used essentially as-is in 1.0. This means that conforming DITA documents need not have any namespaces for the element type names, reflecting current IBM DITA practice. 2. The namespace prefixes for the core DITA packages are "magic" and must be use used as-is in class attribute values in DITA 1.0. This avoids any requirement for DITA 1.0 processors to have to be prepared to dereference core package names to namespace URIs. 3. The DITA 1.0 spec can *discuss* the other ways in which namespaces _can_ be used in conforming DITA applications without actually doing it requiuring it or doing it in the oasis-provided DTDs and schemas. 4. DITA-aware tools can use the DITA base namespace to reliably and unambiguously recognize DITA documents and to distinguish DITA documents from non-DITA documents. For example, the requirements of my XIRUSS-T content management tool are met. 5. Tools that condition processing based on the *DITA application* being used can either continue to use external identifiers to indicate the application (the current mechanism) or can require the use of application-specific namespaces if they so choose (for example, both XIRUSS-T and Microsoft Word would have to require the use of application namespaces as neither uses DOCTYPE declarations in any way). Note that this enables the use of the existing DTD and schema files as provided but does not require their use--the ultimate test of DITA conformance must still be architectural validation. If someone wants to have namespace-qualified element types they'll need to create their own versions of the DTDs or schemas that add the appropriate namespace declarations and qualifications. There might be a way to enable this in the DTDs using parameter entities but I'm not sure it's worth the effort to do it. 6. User-defined specialization packages *must* be namespace qualified and DITA processors should expect to have to dereference non-core package names used in class attributes to namespace URIs. I don't see a away around this as the alternative is to accept the potential for unresolvable package name collision in class attribute values. I don't think this is a hardship in practice. It does suggest that perhaps there are at least two levels of conformance for DITA processors: those that only recognize core DITA packages and those that can handle all packages. Because the class attribute values would still use the same syntax they do today, existing techniques for matching class values would still work, it would just be up to document authors or DITA application builders to ensure that all package names are unique and consistent (for example, to enable reliable binding of CSS styles to class attribute values). >>I would state this more strongly: package identifiers *are* namespace >>identifiers. That is, for every package identifier this is/must be >>exactly one corresponding namespace URI declared within the scope of the >>use of the package identifier (that is, on the element or one of its >>ancestors). > > > In principle, I agree strongly. In practice, my concern is that, to > implement this approach, we have to solve problems like swapping namespaces > in and out of the class attribute during generalization and > respecialization. I'm not sure I understand this comment: the value of the class attribute is (conceptually) just a list of namespace prefixes that map to the URIs for packages. The class attribute value need never change. If we ignore schema-less documents for the moment, then at worst it would require a DITA application implementor to update the *fixed* values of class attriutes in order to disambiguate the prefixes of two user-defined specialization packages that happen to use the same prefix value by default. But this is a very small cost and only affects the application implementor--it would not affect authors or processor implementors (at least for processors that can dereference package prefixes to namespace URIs). For schema-less documents, where all the class attributes would be explicit on each element, you would have to rewrite all the class values in order to use that document with an updated DITA application, but even that can be handled through a trivial transform and is probably a rare case in any event (the use of schemaless documents not being particularly good practice in most use cases). > On the class attribute question, I'm sorry that I misunderstood before. > Can you expand on the benefits of attaching a namespace to the class > attribute itself? I see the importance of attaching namespaces to elements > whether manifest in the document or latent within the value of the class > attribute. I would have thought that, like other attributes, the class > attribute itself should be in the same namespace as the element containing > it. That would seem less complex for authors and processes. I think you've got it exactly backward: In a DITA context, there is *no particular value* to having element type names in any particular namespace precisely because they are identified, for processing purposes, entirely by the values of the class attributes. This is independent of authoring concerns (user interface)--in that context element type names are important but since they are also arbitrary with respect to DITA processing they essentially factor out of this discussion. I also realized that I'm making an assumption that might not be universal: the value of the class attribute fully qualifies the element type. That is, if the element type name is "concept" then the class attribute value is " topic/topic concept/concept " and if the element type is a further specialization of "concept", "myconcept", then the class attribute value is " topic/topic concept/concept mypackage/myconcept". As long as this is always the case then the element type name is simply irrelevant for the purpose of DITA-based processing. That is, from a DITA perspective, the element type name is, by definition, a synonym for the element's class name. For example, consider this declaration set intended to create the smallest possible document instances: <!ELEMENT a (b, c+) > <!ATTLIST a id NMTOKEN #REQUIRED ditabase:class CDATA FIXED " topic/topic concept/concept " xmlns:ditabase CDATA #FIXED "http://dita.oasis-open.org/1.0/DITA base" > <!ELEMENT b ditabase:class CDATA FIXED " topic/title " > <!ELEMENT c ditabase:class CDATA FIXED " topic/p " > This might not be easy to author but it's 100% understandable by a DITA-aware processor and is a completely valid DITA instance (assuming I've actually declared the elements correctly which I haven't bothered to check). The class attribute is the one invariant in the system that enables all other processing. Therefore it needs to be in its own, independent, namespace so that processors can always find it reliably. One way to think about this is to identify the layers of processing and what information you need to reliably map data to processing at each level. Starting with an input XML document a processor has to ask the following questions: 1. Is this document a DITA document? (That is, does it use the DITA-defined class mechanism to bind its elements and attributes to the DITA element types and attributs? This question can be answered by looking for the DITA base namespace (which qualifies the DITA-specific class attribute). If this namespace declaration is found somewhere in the document then the document *must* be a DITA document. If this namespace declaration is not found then the document *cannot be* a DITA document. Answer: no -> Go to question 3 Answer: yes -> Document is a DITA document, proceed to question 2: 2. Is this DITA document governed by a DITA application I recognize? This question can be answered unambiguously by looking for namespace declarations that name known DITA application namespace URIs on the root element. It can be answered with reasonable (but not 100%) certainty by looking at the external identifier of the document's DOCTYPE declaration or non-namespace schema or by taking the user's word that this is in fact a DITA document governed by a particular application [this is the implication when you apply an application-specific XSLT to a document for example or when you work in an environment that only supports one XML application.] Answer: no -> Go to question 3 Answer: yes -> Document is a DITA-only document. Apply processing for the application(s) that apply to the document. 3. Does this document conform to a non-DITA application I recognize? This question is answered as for question (2): look for known namespaces, external DTD subsets, schema instances, or take the user's word for it. Answer: no: -> Apply DITA-specific processing based on class attribute values. Answer: yes -> Apply application-specific processing (which may itself apply DITA-specific processing if the application happens to be DITA-aware). Notice that a *no point* in this chain of questions does the namespace qualification of element types come into play. All that is important is the declaration or non-declaration of namespace URIs--we absolutely don't care how those namespaces are subsequently used on elements. The only point at which namespace qualification could become an issue is in application-specific processing of elements that is not done based on their DITA class values. Cheers, E. -- W. Eliot Kimber Professional Services Innodata Isogen 9390 Research Blvd, #410 Austin, TX 78759 (512) 372-8122 eliot@innodata-isogen.com www.innodata-isogen.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]