dita message

Subject: RE: [dita] index terms

From: Erik Hennum <ehennum@us.ibm.com>
To: dita@lists.oasis-open.org
Date: Fri, 30 Sep 2005 21:51:32 -0700

Hi, Esteemed TC:

Before incorporating older publishing approaches in DITA, we should consider whether those approaches support the topic architecture and reuse.

Regarding indexing markers, I'd submit that most writers don't think they're indexing a point. They think they're indexing the content around the index marker.

If we want to be specific about our interpretation of a standalone index marker, a few alternatives are obvious:

Treat the index marker as occuring within a range of indexed content with an unspecified but nearby start and end point.
Treat the index marker as occuring at the start of the range of indexed content where the end of the range is unspecified but nearby.
Treat the container element for the index marker as delimiting the range of indexed content.

None of the three approaches does violence to the fundamental assumption that the indexed content is around the index marker. Each of the three can lead to surprises for the writer.

Start and end markers, however, pose problems for reuse. Taking up the problem raised by JoAnn, let's say you want to index a range of three topics about web applications and put a start marker at the start of the first topic and an end marker at the end of the second topic:

<topichead "Creating a web storefront">

<topicref "Installing the application server" ... /> 

<topicref "Common security policies for eCommerce" .../>

<topicref "Developing web applications" ... /> 

</topichead>

In another information set, however, the start and end topics are organized in a different way:

<topichead "Developing server applications">

<topicref "Developing web applications" ... /> 

<topicref "Developing database applications" ... />

...

</topichead>

<topichead "Server administration">

<topicref "Configuring LDAP" ... />

<topicref "Installing the application server" .../> 

...

</topichead>

In the second information set, the end marker precedes the start marker. Worse, content completely unrelated to web applications is in the middle of the range. Worst, there's no way to fix the problem for the second deliverable without invalidating the first deliverable.

The problem is architectural: properties that span multiple topics should be specified in the map context and not in the topic content.

We could move the start and end markers into the map itself:

<topichead "Creating a web storefront">

<topicref "Installing the application server" ...>

<topicmeta>

<keywords>

<indexterm>Web applications

<index-range-start/>

</indexterm>

</keywords

</topicmeta>

</topicref>

<topicref "Common security policies for eCommerce" .../>

<topicref "Developing web applications" ...>

<topicmeta>

<keywords>

<indexterm>Web applications

<index-range-end/>

</indexterm>

</keywords>

</topicmeta>

</topicref>

</topichead>

Let's say you add conditional metadata, however, and filter out the start topic, the end topic, or both. It's ambiguous whether to apply the index term to the middle topic. Maybe the middle topic belongs in the indexed range only as part of a sequence including the start and end topics.

More importantly, it is much more natural to leverage the grouping provided by the parent element:

<topichead "Creating a web storefront">

<topicmeta>

<keywords>

<indexterm>Web applications</indexterm>

</keywords>

</topicmeta>

<topicref "Installing the application server" ... />

<topicref "Common security policies for eCommerce" .../>

<topicref "Developing web applications" ... />

</topichead>

Finally, one of the main reasons for tagging is to define semantic units. Why wouldn't we want to take advantage of those semantic units when indexing?

In summary, defining a range with start and end points works better for a single, static discourse flow than for topics that can be organized in many different ways.

Regarding synonyms, it should at least be possible to maintain associations between controlled vocabularies globally. I've known publications departments (for instance, at Informix) that maintained all of the see synonyms at the end of the introduction because (by definition) a see synonym isn't associated with any particular piece of content. In DITA, however, the map is a much more natural place to maintain definitions that aren't associated with specific content.

Bruce has a good point about centralizing index labels through conref. Especially if keyrefs can be used in conref, that would seem to meet the requirement for being able to maintain index labels centrally.

Because an index term is always about content, an about-href attribute on indexterm would likely be overkill. Topicref already gives you the ability to attach an index term to a topic. (By the way, in passing, a popup for associative index links might be more useful than a ring of links.)

A last consideration. The <term> and <keyword> elements delimit controlled vocabularies that are embedded in the discourse. Should the writer have to add an index marker to index such instances of controlled vocabularies? Or would we be better off indexing delimited vocabularies (possibly under the control of policies)?

What do you think?

Erik Hennum
ehennum@us.ibm.com

Follow-Ups:
- Re: [dita] index terms
  - From: Eliot Kimber <ekimber@innodata-isogen.com>