dita message

Subject: Re: [dita] Topic vs Domain Specialization
From: "W. Eliot Kimber" <ekimber@innodata-isogen.com>
To: DITA TC list <dita@lists.oasis-open.org>
Date: Tue, 14 Dec 2004 13:57:55 -0600
Michael Priestley wrote:

> Hi Eliot,
> 
> 
>>In particular, I'm curious as to why DITA needs the domain declaration 
>>and why the class value distinguishes topic-level specializations from 
>>domain-level specializations through the use of "+" or "-" as the 
>>leading character.
> 
> 
> Conref cannot accurately predict whether two paragraphs have equivalent 
> content models without knowing which domains have been integrated into the 
> two topic types. The domain attribute provides a way for processes to 
> determine when two elements of the same type, possibly even in the same 
> topic type, have different content models. See the topic on "content 
> reference attribute" in the spec for details.

Hmm. OK, I have some serious issues with conref in general, so I'll just 
leave this statement unquestioned for now. [My instinct is that this is 
an unnecessary check for a number of reasons but it's probably not 
productive to have that discussion now.]

> Generalization has different default behavior for domain elements versus 
> type elements. See the topic on generalization in the spec for details.

> This all comes out of the need to share and include markup across 
> structures without declaring new names for the shared elements or 
> impairing reuse of their content. If two topic types have the same element 
> but have created different content models, you need to know how to map 
> them when you reuse between them or generalize from one to the other. 

I think this touches on our earlier discussion around name spaces. My 
contention (and growing conviction) is that two different topic types 
might have the same *local* name for an element but that those elements 
are either in the topic-specific namespace or in a third, 
domain-specific namespace. In that case there can be no conflict or 
ambiguitity about the content models associated with specific 
elements--they are invariant for all use contexts of that element. Note 
that this requires that *all* elements be in some defined namespace 
(that is, no element can be in the "no namespace" or otherwise inherit 
it's namespace from its use context).

So I think we may have an irreconcilable difference based on this 
differing approach to the management of element names. The current DITA 
practice reflects the "no namespace" case. My proposed practice reflects 
the "always a namespace" case. But that discussion is way out of scope 
for now--I'm just trying to understand why the current spec is the way 
it is.

Also, I think it's the case that for any two domain-specific elements in 
a given containment context, that those elements must have some ancestor 
type that is valid in that context at some level of generalization. 
Therefore, there can't be any absolute issue of contextual validity. 
There can only be a question of local validity in the context of a 
particular specialized document type. But that should only be an issue 
for authoring, which I consider to be a somewhat distinct area of 
concern--from a processing standpoint the processing can always fall 
back to some general type in order to find a type whose semantic it 
understands.

[I think what I'm saying here in part is that there are really two 
different degrees of validity: what authors are locally allowed to do 
with respect to the documents they create, which is usually most 
constrained, and what the overall architecture defines as meaningful 
combinations *for processing purposes*, which is must less constrained. 
One of the points of having the specialization hierarchy is that it 
guarantees that every element can be processed in terms of some known 
semantic, even if it is very generic. But there are probably other 
validation contexts, such as interchange between two partners, where 
different levels of constraint need to be agreed upon and enforced--I 
suspect that it is this use case that in part motivates some of the 
domain specialization design.]

> Mappings between topic types or map types and their ancestors are provided 
> in the root-level class attribute; mappings between domains in use and 
> their ancestors (not necessarily in use in the instance, but in use by the 
> document type) are provided in the root-level domains attribute.

OK, this makes some sense, although I'm not sure that I would have done 
it that way.

That is, I'm understanding this to mean that there are two essentially 
(or explicitly) distinct specialization hierarchies: the topic 
specialization hierarchy, which governs what I've been calling topic 
elements, and the domain specialization hierarchy, which governs all 
content elements.

I think that this distinction is a useful one and I do see the practical 
problem of how to declare the domain type hierarchy in some root place.

> Hopefully this explains why we have the topic on "information types and 
> domains" (to be renamed to "structural specialization and domain 
> specialization") in the "Specialization in content" section, rather than 
> in the "Specialization in design" section (per your original request to 
> separate out the use of specialization in document instances from the 
> syntax tricks that we require for design modules to be reusable across 
> document types).

I think I am satisfied that the topic/domain distinction is appropriate 
and it does explain the class/domain distinction. I think part of the 
confusion on my part was that for topics, the class= attribute alone is 
sufficient to communicate the type hierarchy, but for domains, it does 
not appear to be (but I have to think about this some more).

So I think part of what I might be trying to work toward is this 
refinement in how to think about specialization:

- There is a fundamental specialization mechanism, as reflected in the 
class attribute and the general specialization rules. This basic 
mechanism is independent of the context in which it is used (topic, 
domain, or map).

- There are two (or three) distinct and largely orthogonal 
specialization hierarchies in DITA: topic types and domain types (and, I 
think, map types)

- There may be different processing implications for elements depending 
on which specialization hierarchy they are in (I'm not willing to say 
that there definitely *are* different processing implications because my 
instinct says that there shouldn't be, but I am willing to concede that 
there might be).

I think there's a latent issue here, which we kind of danced around in 
the namespace discussion and that definitely needs to be held off until 
2.0, which is how to name and refer to topic types and domain types in 
some unambiguous way, global way. I think that the existence of the 
domain= attribute is a side effect of that issue. Again, my instinct is 
that either the domain attribute really isn't needed, given an 
appropriate naming mechanism, or that there needs to be an analogous 
attribute for topic types. But this is just an instinct at this point.

I'm also working partly from the hypothosis that the XML namespace 
mechanism gives us all the naming tools we need to address these issues. 
So far my experiments have supported this hypothosis to my satisfaction, 
but obviously it will be for others to judge once I can present some 
solid results.

Cheers,

Eliot
-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122

eliot@innodata-isogen.com
www.innodata-isogen.com
References:
- Re: [dita] Topic vs Domain Specialization
  - From: Michael Priestley <mpriestl@ca.ibm.com>