dita message

Subject: Initial Experience With Current DITA Schemas
From: Eliot Kimber <ekimber@innodata-isogen.com>
To: DITA TC list <dita@lists.oasis-open.org>
Date: Tue, 22 Jun 2004 19:00:41 -0500
Executive Summary:

1. I feel strongly that we must define one or more (probably just one) 
namespace for DITA in the 1.0 release (we may have already decided this).

2. The current DITA schemas must be reworked to declare this namespace 
as their target namespace.

-----------------------------------
How I Arrived At These conclusions:

I have started a little research project, XIRUSS-T 
(xiruss-t.sourceforge.net), with the aim of both demonstrating various 
principles and techniques for versioned content management of compound 
documents that use XInclude (or it's moral equivalent) for re-use and 
providing a sandbox for experimentation. Part of the focus of the system 
is techniques for doing compound document import, as that is where much 
of the complexity of compound document management lies.

A fundamental design feature of the XIRUSS system is that it only 
supports the use of schemas, not DTDs, in that if you import a document 
that uses a DTD it does nothing with the DTD reference and will not 
import any external DTD subset or parameter entities into the 
repository. However, if the document uses a schema, it will also import 
the schema (if not already in the repository) and will maintain as 
object metadata the dependency between the document and its governing 
schemas, as well as a mapping from name spaces to schemas.

In addition, the import process uses the document's name spaces to 
determine what, if any, schema-specific import processing to apply to 
the document. For example, my code currently has an XSLT importer that 
recognizes XSLT documents and applies an XSLT-specific importer to them 
in order to import all the member documents of a multi-document XSLT 
transform, as well as base-level support for XInclude include references.

I was trying to implement an importer for DITA documents. 
Computationally the problem is simple: just find all the topicrefs, 
elements with conref= attributes, and so on, and chase them down. My 
importer framework provides a simple model for doing this processing, 
making it a matter of a few minutes to implement a new importer of this 
sort.

The problem I ran into was the way the current DITA schemas are defined.

As provided in both the IBM distribution and the OASIS submissions the 
DITA schemas do *not* have a target name space. A document that uses the 
the DITA schemas does not declare any namespace for DITA, it just uses 
the noNamespaceSchemaLocation= attribute to point to the schema file.

This exactly mirrors the way DTDs are used in XML and also demonstrates 
the reason that I chose *not* to support DTDs in XIRUSS: there is 
nothing about either the reference to the schema instance or the schema 
itself that enables a reliable mapping from an instance document to an 
abstract "document type" (that is, the set of business rules that govern 
a set of documents and their processing).

This means that the current DITA schemas are just a set of syntactic 
constraints with no defined association to any abstract set of rules 
(such as the DITA specifications). Doh!

This means that the XIRUSS system, in addition to not supporting DTDs, 
also can't support schemas with no target namespace. [I can support 
documents that have no global namespace as long as they either use pure 
XInclude for doing use-by-reference but I can't associate 
schema-specific processing with those documents.] I hadn't properly 
appreciated this until now. Doh!

What this really comes down to is that in order for a document to be 
unambiguously associated with a set of business rules it *must* declare 
a root name space. Because the namespace spec explicitly says that one 
cannot presume that a namespace name relates to any particular schema 
(in the generic sense), to be completely clear the namespace must be 
associated with a schema, which can be done any one of three ways 
(schemaLocation= in instances, targetNamespace in schema documents, or 
through an application-specific namespace-to-schema mapping). The schema 
then becomes the physical representative of the larger set of business 
rules and their definitions that makes up a complete document type. The 
namespace the schema governs then becomes one true name for that 
document type.

This suggests to me that DITA must define at least one namespace and 
must associate its schemas with that namespace. Without this there is no 
way to unambiguously know that a given document is in fact a DITA 
document (or formally derived from the DITA architecture).

So I tried the experiment of putting all the various DITA schema files 
in a single namespace. This worked for the purposes of validating the 
documents (at least Turbo XML was happy, Stylus Studio 4.6 was not but I 
suspect that this old version of Stylus is just ignorant). But it 
tripped over some shortcomings in my current XSD importer process. The 
import worked to the degree that I was able to import all the topics 
directly or indirectly referenced by a map but the schema associations 
got a bit confused for reasons I won't bore you with.

However, I wasn't completely happy with this schema design: I suspect 
that it would actually be a more accurate reflection of the true 
abstract DITA architecture to have distinct namespaces for the different 
layers of types and then, if necessary, use schema-level derivation to 
map base names in one namespace to the same name in the namespace of a 
specialized schema.

I tried doing this with the current schemas and it didn't work at all, 
although I suspect that this was in part because Stylus, which I was 
using to edit the schemas, doesn't give me accurate validation feedback 
(reporting problems that aren't really problems). But it did start to 
feel like the current organization of the types just isn't right, in 
that it reflects the *syntactic* organization imposed by using parameter 
entities in DTDs and not the true specialization hierarchy of the 
abstract DITA architecture.

I think we need to think very carefully about how namespaces will be 
used in both DITA 1.0 and 1.+.

Cheers,

E.
-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9030 Research Blvd, #410
Austin, TX 78758
(512) 372-8122

eliot@innodata-isogen.com
www.innodata-isogen.com