dita message

Subject: RE: [dita] Managing DITA DTDs - file naming issues.
From: "Esrig, Bruce (Bruce)" <esrig@lucent.com>
To: "'Robert D Anderson'" <robander@us.ibm.com>
Date: Mon, 2 May 2005 12:29:26 -0400
Perhaps some of this exists already; if so, I'd be grateful for corrections.

I was expecting that modules would be handled in two standard ways. The first one is similar to include files in C programming. The second one is similar in its information content to a comprehensive Makefile in UNIX-like systems.

One is that a file that requires other modules would have a place within the file where it would state what other modules are directly required. That shouldn't have to be inferred from the class attribute ... it should be a standard component of the metadata within the file, or of some other up-front portion. If there is a concern that redundant information might lead to inconsistency, then a tool could be provided to enrich the module file with this information, based on the class attribute.

The second is that a standard syntax should be defined that summarizes the entire partial order of module dependencies, including both direct and indirect dependencies. This could be presented in the same place as the direct dependency information, or in a separate place, or in a separate file.

The dependencies should be stated in terms of module names where possible. Perhaps a separate mapping mechanism is required to identify which files define which modules.

Best wishes,

Bruce Esrig

-----Original Message-----
From: Robert D Anderson [mailto:robander@us.ibm.com]
Sent: Monday, May 02, 2005 11:55 AM
To: dita@lists.oasis-open.org
Subject: [dita] Managing DITA DTDs - file naming issues.


Hello all,

This question was also posted to the dita-users mailing list. For those who
do not follow the Yahoo list, we are looking for any recommendations on DTD
naming schemes to help manage the proliferation of DTDs with new
specialized domains and topic types. While the naming issue was
specifically left out of the DITA spec, we're guessing that this issue will
come up for others in this group, so we're looking for input here as well.
Thanks for any suggestions.
-----------------

This question relates tangentially to Chris Wong's question last week about
DTD practices, but does not exactly address it, so I thought I'd start a
new thread.

Within IBM, we're anticipating the upcoming proliferation of topic and
domain specializations. Until now, all DTDs have used pretty simple system
and public identifiers - for example, concept.dtd refers to concept with
all supported domains. However, if every concept topic uses that, how do
you know what to use for validation? A concept with no domains, with all of
the OASIS domains, or a concept with the OASIS domains plus one of mine?

This boils down to one central problem. I would like to be able to look at
a file, and somehow figure out what specialization modules were used to
create it (also, what modules are needed to validate and process). The only
way this can be done is by having a unique public ID, system ID, or both --
unless we use a hack like storing the DTD info in a PI, but I don't want to
start a debate here over the value of PIs.

There are several distinct problems to worry about. How should you identify
each of these:
1. A concept with 6 domains, when there are 10 domains in use at your site?
2. A concept that cannot nest itself (or anything else)?
3. A concept that nests only reference, when reference then nests itself?
4. A concept that nests (task or reference), when reference nests (topic),
and task nests nothing?

We're now considering using a form of regular expression syntax as part of
both the public and system IDs. The current proposal is rather simplistic -
it uses parenthesis to show what nests (no parenthesis means something
nests itself). When there are multiple children, or multiple domains, the
values are listed in alphabetical order. I've used the full name for topic
specializations, and 2 letter codes for the domains; these are taken from
the values used inside the class attribute for each specialization. Domains
are tacked on to the end of the system ID, after a pair of dashes to
indicate the end of the topic types; for the public ID, they're just added
with a plus. Here are proposed identifiers for the doctypes described
above:
1. A concept with 6 domains; concept nests itself
concept--aa-hi-pr-sw-ui-ut.dtd
-//IBM//DTD DITA Concept +aa +hi +pr +sw +ui +ut//EN
2. A concept that cannot nest anything (no domains)
concept(no-topic-nesting).dtd
-//IBM//DTD DITA Concept (no-topic-nesting)//EN
3. A concept that nests reference, when reference nests only itself - 6
domains again:
concept(reference)--aa-hi-pr-sw-ui-ut.dtd
-//IBM//DTD DITA Concept (Reference) +aa +hi +pr +sw +ui +ut//EN
4. Concept that nests (task or reference), reference nests (topic), same 6
domains:
concept(reference(topic),task)--aa-hi-pr-sw-ui-ut.dtd
-//IBM//DTD DITA Concept (Reference(Topic),Task) +aa +hi +pr +sw +ui
+ut//EN
The biggest problems with this are:
1. It uses comma and parentheses in the file name. This is valid on
Windows, and is valid on UNIX (if not always easy to use), but some users
are uncomfortable using these characters in file names.
2. It does not address the issue of order or quantity in children. This is
not possible in the base modules, but you could have a specialization of
reference that nests one optional concept, followed by zero or more tasks,
followed by a reference. I would not advocate such a setup without seeing
some really good reasons, but it is legal.
3. The syntax is ugly. However, it looks like it has to be ugly to meet all
requirements.
Any thoughts on this? Suggestions on better naming schemes? The main key is
that if somebody sends me a file, I want to be able to look at it and know
what's needed in the DTD.

One suggestion is that users name their DTDs based on their goal --
something like concept-that-I-use-for-intros.dtd. Our main problem with
this is that every group could potentially create the same DTD with a
different name; although everybody is using the exact same markup, it
becomes more challenging to share and process documents. To edit and
process topics from 20 groups, you could need up to 20 copies of the same
DTD with different names.

Thanks-

Robert D Anderson
IBM Authoring Tools Development
Chief Architect, DITA Open Source Toolkit