dita message

Subject: Use of XInclude or Its Moral Equivalent in DITA
From: Eliot Kimber <ekimber@innodata-isogen.com>
To: DITA TC list <dita@lists.oasis-open.org>
Date: Tue, 22 Jun 2004 12:05:20 -0500
Like most, if not all, W3C XML-related specifications, XInclude is 
intended to meet the requirements of *delivery*, which it does quite well.

However, like most W3C specifications, it does not satisfy the 
requirements of *authoring*.

DITA's value is primarily in its support of authoring and rendition, 
with the clear intention that delivery requirements be met by 
delivery-specific forms generated from DITA source.

Given that, it follows that XInclude, as it is currently specified, is 
almost certainly not appropriate for direct use with DITA.

However, I think there are simple ways to create an 
authoring-appropriate form of XInclude that would be consistent with the 
general DITA requirements and the current DITA design while providing a 
direct path to standard XInclude via a trivial transformation.

NOTE: In the following discussion I use the term "element type" to mean 
"base element type or supertype".

We can summarize the current DITA requirements for use-by-reference as:

1. Must be able to create *semantic* references to elements in order to 
establish the logical content of compound documents. That is, 
use-by-reference relationships that are resolved at the application 
processing level, not at the XML parsing level.

2. Must be able to control where references to specific element types 
can occur within documents.

3. Must enable the validation of constraints on referenced elements, for 
example, that the referenced element conforms to the content constraints 
in force at the point of reference.

4. The addressing scheme and/or processing model must allow elements 
within separate source documents to have the same element identifiers 
(that is, the system cannot require a global element ID name space for 
documents as authored).

Given these requirements XInclude *as currently specified* can only 
satisfy requirement 1: XInclude clearly provides a mechanism for 
creating semantic use by reference using element-to-element hyperlinks.

However, XInclude currently provides no way to specialize the element 
type used for representing references. That is it provides no typing 
mechanism such as that used in XLink or in DITA itself. Nor does it 
provide a way to add references to existing element types with some 
other base semantic.

XInclude does not define any mechanism for expressing constraints on the 
reference targets.

For requirement 4 XInclude as currently written is somewhat ambiguous, 
at least in my opinion: Because the abstract processed result is a 
single-document info set, all the *result* IDs must be unique. However 
it can be read as also allowing the rewriting of IDs in the process of 
constructing the result infoset. In any case, the rewriting of IDs is a 
requirement in order to use XInclude for authoring if you are not 
willing to impose a global ID name space over all documents that are 
candidates for re-use (which is of course impossible in the general case 
since the set of candidate documents cannot be completely known at any 
given time except in the context of the most controlled content 
management system).

The DITA spec currently addresses these requirements with the general 
conref= attribute, which provides a general mechanism for creating 
semantic use-by-reference. Because it is an attribute on any element, 
requirement 2 is satisfied. The currently-define rules for what a given 
element is allowed to refer to satisfies requirement 3. Requirement 4 is 
satisfied by not using XML IDs for element addressing and providing a 
DITA-specific addressing syntax.

Clearly DITA cannot use XInclude directly as specified. We could (and 
probably should) work with the XInclude working group to try to enhance 
XInclude to provide the features needed to support authoring, but this 
can't possibly happen before the current delivery-only XInclude spec 
becomes a recommendation.

Therefore the best DITA could do in the short term is to define a 
conref= replacement that is consistent with XInclude such that it 
reflects the semantics of XInclude exactly and uses the syntax of 
XInclude as much as possible.

I have done this type of thing in production document types I've created 
for various clients (as described in my XML Europe 2004 paper).

The approach is pretty simple:

1. Use some sort of typing mechanism to map specialized element types to 
the base xi:include type. For DITA we could simply re-use the existing 
DITA typing mechanism and make xi:include a built-in core type.

This satisfies requirements 1 and 2 and provides an obvious and direct 
mapping from the DITA-specific mechanism to standard XInclude.

The one potential wrinkle here is that in the current DITA design a 
given element type may be either a content container or a content 
reference. My personal preference is that referencing element types 
should always be distinct from the content-containing element types, 
mostly to make it clearer to authors when they are creating a reference 
and when they are not. It also avoids the problem of what to do when an 
element contains both content and makes a use-by-reference. DITA 
addresses this by imposing the business rule that elements that use 
conref= must have empty content, but there is no way to express or 
enforce this rule using normal DTD or schema constraint specifications. 
[In HyTime we simply said that if an element has content and establishes 
a use-by-reference relationship, the local content is "ignored" in the 
resolved result, but that can really confuse authors in practice, which 
is why I prefer to avoid the whole issue.]

For requirement 3, either continue to just define the reference 
constraints in prose or define additional elements or attributes that 
can explicitly define the reference type constraints. In my designs I 
use an attribute called "reftype=" that, at a minimum, takes the name of 
the required target element. It could also take, for example, an XPath 
expression that defines the allowed referents, possibly in terms of the 
value of a type-mapping attribute, e.g.:

  <para conref="yourdoc.xml#/topic2/para-04"
        reftype="//*[contains(@class, ' para ')]"/>

For requirement 4 I recommend simply doing ID and ID reference rewriting 
when resolving use-by-reference links. There is no great difficulty in 
implementing this. The only wrinkle is that the rewriting code must be 
DTD-specific (or at least provide a mechanism for doing per-DTD 
configuration) as XSLT processors provide no way to know which 
attributes are in fact ID or IDREF values. But in the context of DITA, 
we can provide this functionality as part of the DITA tool kit.

By doing ID rewriting, you can then use standard addressing mechanisms 
(i.e., XPointer) rather than having to define, maintain, and implement a 
DITA-specific addressing mechanism.

Finally, XIinclude breaks element addresses into two attributes: href= 
for pointing to entire documents and xpointer= for pointing to 
individual elements. I think this is a good design and prefer it over a 
single href=, for all the reasons stated in the XInclude spec.

This address partitioning approach could provide a way for DITA to have 
its own mechanism for addressing elements while also allowing the use of 
XPointer. That is, if href= attributes never have fragment identifiers, 
then DITA could provide both an xpointer= attribute as well as a 
"dita-pointer=" attribute, which would take the current DITA-specific 
element pointers (sequences of element IDs). This would make the 
semantics of the addresses much clearer and avoid confusion of 
DITA=specific pointers with XPointers or XPaths, from which they are 
syntactically indistinguishable.

Cheers,

Eliot
-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9030 Research Blvd, #410
Austin, TX 78758
(512) 372-8122

eliot@innodata-isogen.com
www.innodata-isogen.com