See comments below.
JoAnn
JoAnn T. Hackos, PhD
President
Comtech Services, Inc.
710 Kipling Street, Suite 400
Denver, CO 80215
303-232-7586
joann.hackos@comtech-serv.com
www.comtech-serv.com
From: Erik Hennum
[mailto:ehennum@us.ibm.com]
Sent: Friday, September 30, 2005
10:52 PM
To: dita@lists.oasis-open.org
Subject: RE: [dita] index terms
Hi,
Esteemed TC:
Before incorporating older publishing approaches in DITA, we should consider
whether those approaches support the topic architecture and reuse.
Regarding indexing markers, I'd submit that most writers don't think they're
indexing a point. They think they're indexing the content around the index
marker.
If we want to be specific about our interpretation of a standalone index
marker, a few alternatives are obvious:
- Treat the index marker as occuring within a range
of indexed content with an unspecified but nearby start and end point.
- Treat the index marker as occuring at the start
of the range of indexed content where the end of the range is unspecified
but nearby.
- Treat the container element for the index marker
as delimiting the range of indexed content. [I think this is a good alternative. Typically index
markers are vaguely associated with a paragraph, heading, or other
container element. They may index a particular word in context but they’re
not intended to point to that word, at least not in a professional index.
Readers want to find information in a text, not words.]
None of the three approaches does violence to the fundamental assumption that
the indexed content is around the index marker. Each of the three can lead to
surprises for the writer.
Start and end markers, however, pose problems for reuse. Taking up the problem
raised by JoAnn, let's say you want to index a range of three topics about web
applications and put a start marker at the start of the first topic and an end
marker at the end of the second topic:
<topichead "Creating a web
storefront">
<topicref "Installing the application server" ...
/> <!-- start -->
<topicref "Common security policies for eCommerce" .../>
<topicref "Developing web applications" ...
/> <!-- end -->
</topichead>
In another information set, however, the start and end topics are organized in
a different way:
<topichead "Developing server
applications">
<topicref "Developing web applications" ...
/> <!-- end -->
<topicref "Developing database applications" ... />
...
</topichead>
<topichead
"Server administration">
<topicref "Configuring LDAP" ...
/>
<topicref "Installing the application server" .../>
<!-- start -->
...
</topichead>
In the second information set, the end marker precedes the start marker. Worse,
content completely unrelated to web applications is in the middle of the range.
Worst, there's no way to fix the problem for the second deliverable without
invalidating the first deliverable.
The problem is architectural: properties that span multiple topics should be
specified in the map context and not in the topic content.
We could move the start and end markers into the map itself:
<topichead "Creating a web
storefront">
<topicref "Installing the application server" ...>
<topicmeta>
<keywords>
<indexterm>Web
applications
<index-range-start/>
</indexterm>
</keywords
</topicmeta>
</topicref>
<topicref "Common security policies for eCommerce" .../>
<topicref "Developing web applications" ...>
<topicmeta>
<keywords>
<indexterm>Web
applications
<index-range-end/>
</indexterm>
</keywords>
</topicmeta>
</topicref>
</topichead>
Let's say you add conditional metadata, however, and filter out the start
topic, the end topic, or both. It's ambiguous whether to apply the index term
to the middle topic. Maybe the middle topic belongs in the indexed range only
as part of a sequence including the start and end topics.
More importantly, it is much more natural to leverage the grouping provided by
the parent element:
<topichead "Creating a web
storefront">
<topicmeta>
<keywords>
<indexterm>Web applications</indexterm>
</keywords>
</topicmeta>
<topicref "Installing the application server" ...
/>
<topicref "Common security policies for eCommerce" .../>
<topicref "Developing web applications" ...
/>
</topichead>
Finally, one of the main reasons for tagging is to define semantic units. Why
wouldn't we want to take advantage of those semantic units when indexing?
In summary, defining a range with start and end points works better for a
single, static discourse flow than for topics that can be organized in many
different ways. [I believe Erik has
stated the issues correctly here. I wonder if we might define a best practice
that does not include ranges, for all the reasons Erik has provided above. The
purpose of a page range in an index is to indicate to the reader that the topic
is covered more thoroughly there than in other references. A reader would
select the page range first because that would indicate a longer discourse than
a single page reference. Of course, none of this applies to the way indexes
work in help systems; page ranges don’t apply. Perhaps we should not
support page ranges at all in a topic architecture but rather provide another
way to indicate the “preferred” reference for a topic. I suspect
that no one ever looks at the last page of a range but always turns to the
first page of the range. The hierarchical arrangement of topics in a map and in
the rendering would also indicate a range if the referenced topic is at a
higher level than several subsequent topics. If we can add an attribute that
indicates a “preferred” or “primary” reference to a
subject, that might take care of the reader’s requirement.]
Regarding synonyms, it should at least be possible to maintain associations
between controlled vocabularies globally. I've known publications departments
(for instance, at Informix) that maintained all of the see synonyms at the end
of the introduction because (by definition) a see synonym isn't associated with
any particular piece of content. In DITA, however, the map is a much more natural
place to maintain definitions that aren't associated with specific content. [How would this work?]
Bruce has a good point about centralizing index labels through conref.
Especially if keyrefs can be used in conref, that would seem to meet the
requirement for being able to maintain index labels centrally. [I don’t exactly follow. How would the centralized
index labels be maintained through conref and keyref?]
Because an index term is always about content, an about-href attribute on
indexterm would likely be overkill. Topicref already gives you the ability to
attach an index term to a topic. (By the way, in passing, a popup for
associative index links might be more useful than a ring of links.)
A last consideration. The <term> and <keyword> elements delimit
controlled vocabularies that are embedded in the discourse. Should the writer
have to add an index marker to index such instances of controlled vocabularies?
Or would we be better off indexing delimited vocabularies (possibly under the
control of policies)? [If I’m
following this correctly, it might lead to a concordance rather than an index.
You do not want to index all instances of a term or keyword but only those that
link to relevant information to which the term is a key.]
What do you think?
Erik Hennum
ehennum@us.ibm.com