OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

dita message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Indexterm: page ranges

Thanks to all the TC for responding at length. I'd like to respond here at the very first topic, Erik's "GENERAL" heading on an indexterm covering a unit of content. My difficulty in seeing indexterms this way -- apart from the fact that this is not how readers and authors would see them -- is that XML must be well formed. There can be only one hierarchy active. On the other hand, index entries can reflect completely orthogonal organizations. You can have index entries that overlap/straddle each other or their parent nodes. There is no reason to assume that an index entry range can exist within well-formed XML.
Indeed, an index range that merits its own container may face an ontological problem: according to Microsoft's manual of style, it should not exist. Such a sustained discussion can merit its own topic or should otherwise belong only in the table of contents. If it is part of the overall document structure, it probably is a candidate for the TOC, not the index. Readers use the index for other information.
Here's a concrete example. Suppose I wrote a task on how to change my car's spark plugs. The sequence goes something like:
Suppose I want my reader to be able to look up where auto tools are used in my new masterpiece "Auto Misrepair for Dummies" book that incorporates this task. Using pseudo-XML notation, the relevant index entry ranges go like:
<anti-seize compound><socket extension><torque wrench>
</anti-seize compound></socket extension></torque wrench>
Apart from the fact that these ranges completely overlap each other, they also cross the task <step> element boundaries and child elements of <step>: cmd,  info, substeps, tutorialinfo etc. Human languages are such annoyingly undisciplined things. That is why I felt compelled to propose page range start/end markers outside of the XML structure.
There are few other things I want to cover from the discussion on page ranges:
-----Original Message-----
From: Erik Hennum [mailto:ehennum@us.ibm.com]
Sent: Friday, September 30, 2005 12:39 AM
To: Chris Wong
Cc: dita@lists.oasis-open.org
Subject: RE: [dita] Groups - DITA 1.1 Issue #45: Add See, See Also indexing elements (IssueNumber45.html) uploaded

Hi, Chris:

Interesting issues...

GENERAL. Fundamentally, what is an index term? In a topic architecture, I'd submit that we should regard an index term as a semantic label attached to a unit of content (such as a phrase, paragraph, list, table, section, topic, or collection of topics). We should not regard an index term as attached to a point within a discourse flow because a point doesn't have any meaning.

The following example

<p>...<indexterm>Application servers</indexterm>...</p>


"This paragraph is about application servers."

That's true regardless of where the index term appears within the paragraph. To indicate that the index term applies only to a sentence, the writer could wrap a <ph> element around the indexed sentence. That is, the container of the index term defines the unit of content that's about application servers.

PAGE RANGES. From that perspective, we shouldn't need start and end markers for a range. By definition, the container specifies the range for the indexed unit of content. (For an index marker within a prolog, the effective container is the topic.)

A formatter might apply the rule that, if the container spans more than one or two pages (or some threshhold controlled by a style policy), the generated index shows a page range. Otherwise, the formatter emits the start page for the container.

That way, the writer doesn't have to maintain page ranges depending on the output. If the writer starts with an index marker on a section but adds content to the section until it stretches to three pages (shudder), the writer doesn't have to change the index marker to start and end markers. If the section fits on a single page when output as 8 1/2 by 11 but flows over three pages when output as A5 (or whatever), the writer doesn't have to revise the topic depending on the output.

In the implementation, during the topic merge phase, the preprocessor could insert processing instructions at the start and end of the container if convenient for easy processing of the range.

If you find yourself wanting to index a range of content that's a subset of a container, you should ask yourself whether the content merits a container. That is, requiring that semantic units have containers is consistent with the topic-oriented approach of assembling larger structures from small, granular, typed units of content.

In passing, the same ambiguity that came up for the <data> element rears its ugly head here. If I put an index markers within a topicmeta for a topicref, should the range be the referenced topic or the entire branch of the map? Do we need a systematic way to distinguish the properties of the referenced topic from the properties of the referencing collection?

SEE vs SEE ALSO. I'm wondering if we could produce both outputs correctly from a single element that expresses synonyms for index terms. As I understand the publishing convention for "see" and "see also," the correct tag depends on which terms have instances:

In other words, the same synonym might be a "see" or "see also" or nothing, depending on whether the aggregating map has assembled topics that have instances of the source and target term.

GLOBAL SORTS AND SYNONYMS. I'd submit that it should be possible to declare sort keys and see / see also synonyms as global definitions rather than definitions associated with specific instances.

After all, what if an index term has a sort key in one instance and either no sort key or a different sort key in another instance of the index term?

Also, when the output is generated, a see / see also synonym applies to every instance of the index term rather than to a specific instance. Finally, the most typical reason for defining synonyms is to identify related content. Because the map controls the assembly of content, synonyms would sensibly be as aspect of assembly.

Perhaps it would make sense to define sorts and synonyms within the <keywords> element. That way, the common case (global definitions of sorts and synonyms) is easy, the edge case (content that requires sorts or synonyms) is awkward but possible, and index terms embedded within content don't provide a bulky distraction from the discourse flow.

Maybe something like the following:

In passing, the keyref proposal (#40) should make it possible to index the topic content but assign the labels to those index terms in the map. Producing a good index often requires adjusting the labels based on the labels of the other indexed content. Having to go back into the content to align index terms is an enormous pain and an inhibiter for reuse -- especially if you'd like to freeze the content but perform final production on the index.

MISCELLANEOUS. I'd agree with Paul that, with <indexterm> (as with <section>), there's an implied structure on the content that can only be validated by XML parser when the grammar can impose constraints on mixed content models. Regarding linking, a generated index in HTML or PDF output should have links to the instances of the index terms. I suppose the instances of a term could sensibly link to one another as a convenience (if the hotspot isn't too distracting).

What do you think?

Erik Hennum

"Chris Wong" <cwong@idiominc.com> wrote on 09/28/2005 09:15:28 AM:

> I'm kind of surprised to see no questions or objections so far to
> this proposal. I hear that people can have strong opinions about
> this subject. I'd like to see any debate get underway so we will
> have time to move this issue forward. Anyone?

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]