dita message

Subject: DITA-aware Importer Implemented for XIRUSS-T
From: Eliot Kimber <ekimber@innodata-isogen.com>
To: DITA TC list <dita@lists.oasis-open.org>
Date: Wed, 30 Jun 2004 23:29:54 -0500
I have implemented a basic DITA-aware importer for the XIRUSS-T system
and provided enough packaging and documentation that you should be able
to run it if you have ANT or Eclipse installed. The XIRUSS-T system is a
GPL-licensed open-source project for demonstration and educational
purposes *only*. It is not a product is expressly not appropriate for
production use.

This code is relevant to the DITA project for at least the following
reasons:

- It demonstrates (I hope) why having schemas bound to namespaces is
important. Without this, content management systems (and any other
generalized XML-aware system) cannot reliably associate
document-type-specific processing with documents because there's no
other way to know for sure that a given document is governed by a
particular document type unless a human provides that association on a
case-by-case basis. In the case of XIRUSS-T it is the use of
DITA-specific namespaces that allows my code to reliably and
automatically bind DITA documents to the DITA-aware importer in order to
import maps and conrefing documents as complete compound documents.

- It provides an example of generic, DITA-aware content management
functionality that takes direct advantage of the DITA architecture
mechanism. I haven't had a chance to test it yet, but the DITA importer
should handle any document that derives from the DITA map or topic
document types.

- It provides an open-source sandbox with which others can experiment
with more sophisticated or specialized DITA-aware processing. For
example, it would probably be useful to build additional modules that
support the processing of related links and other sophisticated linking
features of DITA. Likewise, if you've created a task-specific
specialized DITA-based document type it should be fairly clear how to
quickly specialize the existing code to support your unique import
requirements.

The overall XIRUSS-T project site is http://xiruss-t.sourceforge.net.
 From there you can download the code distribution. To try it, just
download the Zip file, unpack it somewhere (e.g., into a directory
called "xiruss-t") and either do "ant xirussRunner" to start the
XIRUSS-T server or set up an Eclipse package per the instructions on the
Web site.

When you start the server it automatically imports a bunch of files,
including a couple of DITA maps that in turn refer to a bunch of topics,
one of which does a conref=. The result is that all the files rooted at
the maps are imported and show up in the repository with the appropriate
dependencies captured.

NOTE: This uses my modified namespace-based DITA schemas and instances.
However, all I've changed is to add namespace declarations. Otherwise
the documents are pure DITA.

The XIRUSS system provides a general importer framework and that
framework is used to implement the DITA-specific importer. The main
classes involved are com.innodata.xiruss.bos.xml.dita.DitaBosMember,
com.innodata.xiruss.bos.xml.XmlBosMember (of which DitaBosMember is a
subclass), and com.innodata.xiruss.bos.BosMemberFactory, which
constructs DitaBosMember objects for XML documents that use one of the
DITA namespaces I've defined (either DITA/map or DITA/base).

It is the DitaBosMember class that encapsulates knowledge of DITA
topicref elements and conref= attributes.

To see the results of the import, start the server and then open a Web
browser to "http://localhost:9090/";. You will see three links:
resources, branches, and Repository dump.

If you select branches and then the "dita stuff" branch, you will see
two snapshots. The first snapshot reflects the import of
"simple.ditamap", the second the import of "hierarchy.ditamap". If you
navigate to a snapshot and then a version you can see the content of a
file as the browser renders it (i.e., as an XML file).

If you go to the repository dump you can see all the information and
meta-information actually stored in the repository. The main repository
page shows all the resources in the repository, all the versions (each
version is associated with exactly one resource, a resource may have
many versions), the branches, and the repository schema registry, which
maps namespaces to schema instances. You should see the two DITA-related
namespaces (which I made up for this experiment) mapped to map.xsd and
ditabase.xsd (you'll have to navigate to the resource and then the
version to see the original filename).

If, through the repository dump, you navigate to the "dita stuff" branch
and then to one of the snapshots, you can see the results of the
dependencies established during import, which are used to reflect the
"where-used" information. For example, if you find the entry for
"organizing.xml" you'll see that it's used by "changingtheoil.xml",
reflecting the conref= I created in changingtheoil. Likewise, if you go
to the second snapshot within the "dita stuff" branch, you'll see that
each of the topics is shown as being used by hierarchy.ditamap or
simple.ditamap. Note that in the second snapshot some files appear to be
in the repository twice. This is because on import you have to
explicitly indicate that a given file is in fact a new version of an
existing resource. I haven't done this for the import of
hierarchy.ditamap [in practice this would be done either through
use-case- or business-rule-specific heuristics, such as filename
matching, CVS-like conventions, or through an interactive user interface
for doing imports].

If you navigate to a version you will see all the properties of that
version. For XML documents this includes XML-specific properties such as
the root element type, namespaces used, and governing schema, if any.

You will also see any dependencies from the version to other resources
(in XIRUSS dependencies are always from versions to resources). For
example, if you go to the version for hierarchy.ditamap you will see
that there is one use-by-reference dependency for each topic referenced
from the map, as well as a governed-by dependency to its governing schema.

Also, for each version you can see the source bytes and, if the version
is a text object, the text.

Finally, if you examine any XML versions that include links, you will
notice that the link address URLs have been rewritten to point to
resources in the repository, reflecting the location of the target as
imported. URLs of the form "res_00000026~onSnapshot" are relative URLs
that are resolved relative to a specific snapshot. For example, to
resolve a resource to a version on snapshot snap_00000007 you would use
this fully-qualified URL:

http://localhost:9090/snap_00000007/res_00000026~onSnapshot

(Try this is almost any XML editor--it should just work, including
fetching the schema. Unfortunately I haven't yet implemented rewriting
of stylesheet PIs to point into the repository, so browsers may complain
when you try to view some XML files.)

Cheers,

Eliot
-- 
W. Eliot Kimber
Professional Services
Innodata Isogen
9030 Research Blvd, #410
Austin, TX 78758
(512) 372-8122

eliot@innodata-isogen.com
www.innodata-isogen.com