dita message

Subject: Re: [dita] Namespace resolution
From: Eliot Kimber <ekimber@innodata-isogen.com>
To: DITA TC list <dita@lists.oasis-open.org>
Date: Tue, 24 Aug 2004 11:08:12 -0500
Erik Hennum wrote:

> 2.  Vocabulary processing
> 
> Content is validated and processed against the vocabulary as a whole rather
> than against the individual specialization modules integrated by the
> vocabulary.

I think that in fact content can be validated both against its direct 
governing vocabulary and against the individual modules from which the 
vocabulary is composed. If the vocabulary is a "DITA application" then 
at most the vocabulary can only add constraints to the specialization 
modules, which means that content that might be invalid with respect to 
the overall vocubulary might still be valid with respect to a particular 
module.

For example, if a processor is a generic DITA processor it will only 
care about validation against the specialization packages because it has 
no knowledge of any other rules (knowing only about the DITA-defined 
rules).

I don't think it's necessary to make absolute statements about 
validation--validation is a type of processing that serves specific 
business tasks and requirements. The most we can say with certainty is 
what can and cannot be validated using a particular technology (e.g., 
DTDs, schemas, Schematron, one-off validation applications, human 
inspection, and so on).

It is important to identify the types of validation that are possible 
and when such validation can or should be done.

> For instance, when resolving a conref, processes are obligated to check
> whether the vocabulary for the referencing document includes the
> specialization modules for the elements in the referenced content.
> Otherwise, the conref could include invalid elements.

I understand the motivation for this statement but it's not that simple, 
at least in the general case of doing transclusion (the current DITA 
definition of conref may be more constraining, possibly too constraining).

In the general case, whether a given transcluded element is "valid" or 
not is entirely a function of the thing that is processing the result. 
For some applications any combinations will be valid (meaning they can 
be meaningfully processed). For other applications only exact element 
type matches are valid. And for others, transclusion of any compatible 
class of the referencing element is valid (e.g., any class within the 
same specialization hierarchy).

> If every element can be processed in isolation, the specialization modules
> can provide complete processing.  If the processing requires contextual
> sensitivity, however, the vocabulary has to be able to affect the
> processing. After all, the vocabulary controls the context.

I'm not sure I fully understand the use of "processing" in this case. 
That is, by "vocabulary" do you mean "application" (as I've been using 
it)? I'm trying to keep a clear distinction between the "vocubulary", 
which is the definition of the set of types, and the "application", 
which is the combination of a vocabulary and a set of business rules 
that define how elements in the vocabulary must or can be processed. 
It's a subtle distinction but I think it's important to make in XML 
because of the fact that XML content (and by extension, XML document 
constraint specifications such as DTDs and schemas) are entirely 
declarative and provide no *processing* specifications. Processing 
specifications are entirely in the domain of prose and software 
component implementations (e.g., style sheets, Java objects, etc.).

Or more simply: vocabularies define content constraints, applications 
define processing for the content. There may be multiple applications 
associated with a single vocabulary.

> For instance, in one domain, I've specialized section as backgroundSection
> so my topics can include background content.  In another domain, I've
> specialized title as safetyInstructionTitle so I can include safety
> instructions as either a topic or section.  I now create a vocabulary that
> integrates the two domains, so I can have background sections that provide
> safety instructions.  In the same way that a term within a dlentry has
> processing expectations, a backgroundSection that contains a
> safetyInstructionTitle could have processing expectations (perhaps of
> isolation, implemented as a sidebar for some outputs).  Only the vocabulary
> can specify the processing expectations for the combination of the two
> elements.  After all, the background and safety modules might be supplied
> by designers who are completely unaware of one another's specializations.

To apply my terminology: I would say "Only the application can specify 
the processing expectations for the combination...."

> Note that this processing expectation is part of the semantics of the
> vocabulary.  Different applications may realize those processing
> expectations in different ways.

And here, just to continue to be pedantic, I would use the term 
"processors" instead of "applications". That is, I'm trying to use the 
term "application" in the sense originally used in the SGML standard 
(the set of rules associated with a document type) not in the sense of 
"a set of software components that perform a task".

I realize it's hair splitting to a degree but there is so much potential 
confusion and so much abstraction that without very precise terminology 
misunderstanding is a certainty.

> 3.  Element polymorphism

> We don't want to limit processing of DITA content, however, to
> DITA-sensitive applications -- especially where existing vocabularies are
> being retrofitted as DITA vocabularies.  For DITA-insensitive applications,
> the declared element type is everything and the class attribute is nothing.

Remember I'm not saying that, for example, the element types in the 
DITA-supplied reference schemas should be arbitrary--far from it. I'm 
just pointing out that in DITA-based applications there need be no 
general constraint on element type names. At a minimum we can say that 
element type names may or may not be namespace qualified. Or we can say 
that fully-conforming DITA processors must use DITA class attribute 
values to apply DITA processing semantics to elements, meaning that 
element type names are unconstrained.

I don't think as standards writers we need to mandate the 
interchangability of document instances--it is sufficient to define a 
mechanism by which instances can be maximimally interchangable, which 
would be by having all element type names be the same as DITA-defined 
class names.

> In addition, the declared element type is displayed to human readers of the
> content to guide their understanding of the semantics of the content.
> 
> Because the actual element name is important for these purposes, the DITA
> architecture mandates support for generalization and respecialization
> operations to change the declared element type.

I'm not sure I understand this comment. Element type names are important 
  but they are not important *to the DITA standard*. They are important 
to designers and implementors of DITA-based applications.

Remember that the DITA-defined specialization packages define element 
*classes* not element types--element types only exist in DITA-based 
vocabularies. So while the DITA standard will define a set of classes 
whose names must be carefully thought out, the element type names used 
in a given DITA-based application are still arbitrary.

>>
>>5. DITA applications in which element type names are qualified with
>>their corresponding package namespaces. This is possible for the same
>>reason (4) is possible: element type names are arbitrary.
> 
> 
> Would the root element for the DITA content have to declare both the
> namespace for the vocabulary and the namespace for the element's
> specialization module?

Yes, assuming the specialization module is not a "magic" DITA-defined 
core module.

> For instance, how would a specialized topic declare both the namespace for
> its specialization module and the namespace for the vocabulary that's
> combining it with other topic types and domains?  As in the illegal:
> 
> <specializedTopic
>     xmlns="http://some.org/dita/vocabulary/specializedVocabulary";
>     xmlns="http://some.org/dita/module/specializedTopic";
>    class="- topic/ph
> http://some.org/dita/module/specializedTopic#specializedTopic ">

The specialization modules must be associated with a prefix, so there 
can never be a conflict with the document's defaul namespace. Thus your 
example should be:

 > <specializedPh
 >     xmlns="http://some.org/dita/vocabulary/specializedVocabulary";
 >     xmlns:module1="http://some.org/dita/module/specializedTopic";
 >    class="- topic/ph
 >             module1/specializedPh">

Remember that for the purpose of determining whether a given namespace 
is "in scope " for an element you only need to examine the declarations 
and you don't care what the prefixes are. That is, if my application is 
going to examine the above element to see if the "specializedTopic" 
namespace is in scope I would simply examine all the namespace 
declaration attributes to see if any of them contain the expected URI.

>>2. The namespace prefixes for the core DITA packages are "magic" and
>>must be use used as-is in class attribute values in DITA 1.0. This
>>avoids any requirement for DITA 1.0 processors to have to be prepared to
>>dereference core package names to namespace URIs.
>>
>>3. The DITA 1.0 spec can *discuss* the other ways in which namespaces
>>_can_ be used in conforming DITA applications without actually doing it
>>requiuring it or doing it in the oasis-provided DTDs and schemas.
> 
> 
> It's an inspired compromise for 1.0 to treat specialization module
> qualifiers as magically bound to namespaces that aren't actually declared
> on the element.  I'd like to see it applied to both core DITA and non-core
> specialization modules so we don't have a two-tier typing scheme.

I assume by "two-tier typing schema" you mean a typing scheme applied to 
some modules but not others?

I agree, consistency seems to be paramount here.

I think there is very little risk in having module prefixes bound to 
namespaces since it doesn't affect existing processors in any way and it 
doesn't affect element type naming or schema construction, apart from 
requiring the namespace declarations, which is standard XML syntax and 
doesn't change the processing any tool would do (that is, 
namespace-aware processors will handle the declarations as they would 
anyway and namespace-unaware processors will continue to ignore them).

I don't see how making module prefixes globably unique names can be 
controversial.

>>>In principle, I agree strongly.  In practice, my concern is that, to
>>>implement this approach, we have to solve problems like swapping
> 
> namespaces
> 
>>>in and out of the class attribute during generalization and
>>>respecialization.
>>
>>I'm not sure I understand this comment: the value of the class attribute
>>is (conceptually) just a list of namespace prefixes that map to the URIs
>>for packages. The class attribute value need never change.
> 
> 
> Sorry, I was obscure.  The class attribute doesn't change, but the
> namespace on the element would have to change during generalization and
> respecialization.
> 
> For instance, here's the element before generalization
> 
> <specializedPh
>     xmlns="http://some.org/dita/module/specializedDomain";
>    class="- topic/ph
> http://some.org/dita/module/specializedDomain#specializedPh ">
> 
> and after generalization
> 
> <ph
>    class="- topic/ph
> http://some.org/dita/module/specializedDomain#specializedPh ">
> 
> If the namespace isn't changed, the element will be in either no namespace
> or the wrong namespace and thus won't be valid.

By generalization I assume you mean "generating a new instance whose 
element types are superclasses of the original input element."

I think there is confusion about where the namespace is applied, as 
discussed above.

The namespace for the specialization package is *never* the namespace of 
the element type. Therefore, during generation of a generalized instance 
you would be rewriting the class attribute and, presumably, changing the 
element type name. It would not be necessary to remove declarations of 
names spaces not used in the generalized instance but you could if you 
wanted to. That is, there's no problem with declaring namespaces that 
are never used.

So I think your example could be:

  <specializedPh
      xmlns:module1="http://some.org/dita/module/specializedDomain";
     class="- topic/ph
              module1/specializedPh">

  and after generalization

  <ph
      xmlns:module1="http://some.org/dita/module/specializedDomain";
      class="- topic/ph">

Note that the class= attribute has been rewritten but the namespace 
declaration has not been removed. But it could be:

  <ph
      class="- topic/ph">

Both "ph" instances are equivalent and would be processed the same way.

I should note also that this notion of the generation of literal 
generalized instances is not something that the DITA specification needs 
to define--this type of processing is simpl one of many types of 
processing that might be applied to DITA documents and the ability to do 
it is inherent in the nature of the specialization mechanism.

>>
>>As long as this is always the case then the element type name is simply
>>irrelevant for the purpose of DITA-based processing. That is, from a
>>DITA perspective, the element type name is, by definition, a synonym for
>>the element's class name.
> 
> 
> Agreed, for DITA-sensitive applications, the element name is irrelevant.
> 
> DITA content also should be processable, however, by DITA-insensitive
> applications. For those applications and as well as for human consumption,
> the DITA architecture needs to support changing the element name --
> effectively, casting to a different declared type.

I'm still not understanding this comment: the DITA specification can 
only define processing in terms of class values. The element type name 
value is simply outside the scope of the DITA architectural mechanism.

We can state that there is a class of simple DITA processors that expect 
element type names to be the same as DITA-defined class names and that 
in order to satisfy such processors one is encouraged to make element 
type names the same as leaf class names but I see no reason to require 
that so I see no reason for the architecture mechanism to say anything 
about element type names at all.

> 
>>...
> *  By the namespace on the root element if the namespace matches that of a
> known DITA vocabulary

A document need not be rooted at a DITA element. A document is a DITA 
documente if *any element* is derived from a DITA-defined type. A 
document is a "DITA-only document" if its root is a DITA-defined type 
and all elements are likewise derived from DITA-defined types.

> Regardless of whether the class attribute is namespaced, wouldn't these
> tests have to be performed anyway and in the same way?
> 
> That is, couldn't a content management system such as XIRUSS-T use the
> following approach?
> 
> 1.  Is a namespace declared on the root element?  If so, match the known
> namespaced vocabularies including the known DITA vocabularies.

Yes, but not limited to the root element--declared anywhere within the 
document.

> 2.  Is a DTD declared for the document?  If so, match the known
> vocabularies with public identifiers including the DITA vocabularies.

XIRUSS-T, like MS Word, doesn't do anything with DTDs. So no, this 
wouldn't work. In any case, this is not reliable because the external 
DTD subset is not 100% reliable way to determine the true document type 
of a document.

> 3.  Is a Schema declared for the document?  If so, match the known
> vocabulary declarations including the DITA vocabulary declarations.

If the schema is bound via the nonNamespaceSchemalocation then it is no 
better than an external DTD subset.

If the namespace is declared by schemaLocation then there is also a 
namespace declared and I don't need to look at the schema.

> 4.  Prompt the user for known vocabularies including the DITA vocabularies.

This is reliable to the degree the user can answer the question 
accurately but is not general in the sense that it does not support the 
use case of generic processors acting on documents without further input.

> If a namespace on the class attribute doesn't reduce the number of tests
> needed to match content with a handler, would it make sense to defer
> namespacing the class attribute until the full namespace solution is
> specified?  That way, we keep our options open in case something else in
> the solution makes it unnecessary to namespace the class attribute?

I don't think so. Part of the point of namespacing the class attribute 
is to ensure that the DITA class attribute can always be distinguished 
from other class attributes. For example, I have existing document types 
that have a class attribute--if I wanted to retrofit those to use DITA 
as their underlying architecture I would have to change one attribute 
name or the other. Therefore, requiring the class attribute to be 
qualified ensures that at minimal cost.

Remember too that attributes, by definition, are not in any namespace 
unless they are qualified. That is, putting an element in a DITA-defined 
namespace *does not* put the attributes of that element in a 
DITA-defined namespace. Many specifications ignore this but nevertheless 
it is the case.

Thus if the class attribute is not qualified it cannot be reliably 
recognized as being the DITA class attribute.

Qualifying the class attribute, and only the class attribute, also 
ensures that documents are bound to a DITA-defined namespace without 
constraining or further complicating any other processing or requiring 
the declaration of namespaces used only within class attribute values 
(which will usually be limited to "magic" DITA-defined prefixes).

That is, I don't see any great risk to qualifying the class attribute 
and much benefit from doing it. Doing this would not in any way affect 
how we might use namespaces in the future for either class attribute 
values or element type names.

> For instance, if in 2.0, the namespace for the base DITA topic module ends
> up declared in the class attribute value, would declaring the namespace on
> the class attribute itself become redundant?
> 
> <ph class="- http://dita.oasis-open.org/modules/topic#ph ">

I never intended that module namespaces be declared in the class 
attribute--there are number of syntactic reasons why this would be a bad 
idea and in any case it's not necessary.

>>This again means
>>that element type names or details such as whether or not applications
>>use namespace qualification need not be a direct concern to the DITA
>>specification itself.
> 
> 
> If (as suggested above) vocabularies are a core construct for the DITA
> architecture, the namespaces used to identify vocabularies are a concern of
> the DITA architecture.
> 
> Also, in the future, there's a strong argument for DITA to incorporate
> namespaces into the typing system to identify specialization modules so we
> can have unambiguous element types.
> 
> Those reasons suggest that the DITA specification shouldn't leave
> namespaces entirely to the discretion of the application.

I'm only talking about the namespace qualification of element type 
names, not the namespaces used for modules or DITA-defined attributes. A 
namespace-qualified element type name is no different from an 
unqualified one as far as the DITA specification is concerned: it's an 
arbitrary name. That means it's up to a given DITA-using vocabulary how 
to define what and how namespaces are used for the element type names in 
that vocabulary.

>>But the DITA standard is *not* primarily an authoring support system. It
>>is a generic standard that defines core types and processing semantics
>>that in turn provides a solid basis from which task-specific authoring
>>support systems can be built. That's a key difference and requires a
>>sometimes subtle shift in emphasis of requirements and features.
> 
> 
> Maybe yes and no?
> 
> 1.  As an architecture, DITA is a typing system for specialization of
> elements, integration of design modules, and so on.
> 
> 2.  As a specific type hierarchy, DITA seeds the architecture with a base
> specialization module, derives core specialization modules, and assembles
> core vocabularies for the problem space of human-readable content.
> 
> The core declaration modules and DTDs are an attempt to conform to the DITA
> architecture within the limits of DTD syntax.  For instance, the class and
> domains attributes exist exclusively to support processing.  Similarly, the
> entity design patterns exist exclusively to support integration of modules
> as vocabularies.
> 
> As a specific type hierarchy, DITA has to be more concerned with
> authorability and readability than, say, SOAP because DITA content in the
> core problem space is, fundamentally, a communication from author to
> reader.
> 
> Are concerns with readability and authorability restricted to the
> declaration level?  Couldn't those concerns be legitimate issues for
> abstract types?

They are concerns for the abstract types but they are not *primary* 
concerns. That is, the abstract type design should prefer precision and 
consistency within the architecture to authorability. By the same token, 
concrete document types can prefer authorability over precision by 
taking advantage of the specialization mechanism to map from the 
abstraction to the concrete.

So I'm not saying that the DITA-defined abstract types should ignore 
authoring concerns but they should not be driven by them.

That is one of the big advantages of an architecture mechanism--it 
provides for clear separation of concerns and avoids having 
implementation details impinge on the core design while providing the 
freedom for implementors to do what they need to do to meet pragmatic needs.

Cheers,

E.
-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122

eliot@innodata-isogen.com
www.innodata-isogen.com