dita message

Subject: Managing DITA DTDs - file naming issues.
From: Robert D Anderson <robander@us.ibm.com>
To: dita@lists.oasis-open.org
Date: Mon, 2 May 2005 10:54:30 -0500
Hello all,

This question was also posted to the dita-users mailing list. For those who
do not follow the Yahoo list, we are looking for any recommendations on DTD
naming schemes to help manage the proliferation of DTDs with new
specialized domains and topic types. While the naming issue was
specifically left out of the DITA spec, we're guessing that this issue will
come up for others in this group, so we're looking for input here as well.
Thanks for any suggestions.
-----------------

This question relates tangentially to Chris Wong's question last week about
DTD practices, but does not exactly address it, so I thought I'd start a
new thread.

Within IBM, we're anticipating the upcoming proliferation of topic and
domain specializations. Until now, all DTDs have used pretty simple system
and public identifiers - for example, concept.dtd refers to concept with
all supported domains. However, if every concept topic uses that, how do
you know what to use for validation? A concept with no domains, with all of
the OASIS domains, or a concept with the OASIS domains plus one of mine?

This boils down to one central problem. I would like to be able to look at
a file, and somehow figure out what specialization modules were used to
create it (also, what modules are needed to validate and process). The only
way this can be done is by having a unique public ID, system ID, or both --
unless we use a hack like storing the DTD info in a PI, but I don't want to
start a debate here over the value of PIs.

There are several distinct problems to worry about. How should you identify
each of these:
1. A concept with 6 domains, when there are 10 domains in use at your site?
2. A concept that cannot nest itself (or anything else)?
3. A concept that nests only reference, when reference then nests itself?
4. A concept that nests (task or reference), when reference nests (topic),
and task nests nothing?

We're now considering using a form of regular expression syntax as part of
both the public and system IDs. The current proposal is rather simplistic -
it uses parenthesis to show what nests (no parenthesis means something
nests itself). When there are multiple children, or multiple domains, the
values are listed in alphabetical order. I've used the full name for topic
specializations, and 2 letter codes for the domains; these are taken from
the values used inside the class attribute for each specialization. Domains
are tacked on to the end of the system ID, after a pair of dashes to
indicate the end of the topic types; for the public ID, they're just added
with a plus. Here are proposed identifiers for the doctypes described
above:
1. A concept with 6 domains; concept nests itself
concept--aa-hi-pr-sw-ui-ut.dtd
-//IBM//DTD DITA Concept +aa +hi +pr +sw +ui +ut//EN
2. A concept that cannot nest anything (no domains)
concept(no-topic-nesting).dtd
-//IBM//DTD DITA Concept (no-topic-nesting)//EN
3. A concept that nests reference, when reference nests only itself - 6
domains again:
concept(reference)--aa-hi-pr-sw-ui-ut.dtd
-//IBM//DTD DITA Concept (Reference) +aa +hi +pr +sw +ui +ut//EN
4. Concept that nests (task or reference), reference nests (topic), same 6
domains:
concept(reference(topic),task)--aa-hi-pr-sw-ui-ut.dtd
-//IBM//DTD DITA Concept (Reference(Topic),Task) +aa +hi +pr +sw +ui
+ut//EN
The biggest problems with this are:
1. It uses comma and parentheses in the file name. This is valid on
Windows, and is valid on UNIX (if not always easy to use), but some users
are uncomfortable using these characters in file names.
2. It does not address the issue of order or quantity in children. This is
not possible in the base modules, but you could have a specialization of
reference that nests one optional concept, followed by zero or more tasks,
followed by a reference. I would not advocate such a setup without seeing
some really good reasons, but it is legal.
3. The syntax is ugly. However, it looks like it has to be ugly to meet all
requirements.
Any thoughts on this? Suggestions on better naming schemes? The main key is
that if somebody sends me a file, I want to be able to look at it and know
what's needed in the DTD.

One suggestion is that users name their DTDs based on their goal --
something like concept-that-I-use-for-intros.dtd. Our main problem with
this is that every group could potentially create the same DTD with a
different name; although everybody is using the exact same markup, it
becomes more challenging to share and process documents. To edit and
process topics from 20 groups, you could need up to 20 copies of the same
DTD with different names.

Thanks-

Robert D Anderson
IBM Authoring Tools Development
Chief Architect, DITA Open Source Toolkit