| [Thread Prev]
| [Thread Next]
| [Date Next]
| [Thread Index]
| [List Home]
Subject: Indexterm: page ranges
- From: "Chris Wong" <email@example.com>
- To: "DITA-TC \(E-mail\)" <firstname.lastname@example.org>
- Date: Mon, 3 Oct 2005 17:35:29 -0400
all the TC for responding at length. I'd like to respond here at the very first
topic, Erik's "GENERAL" heading on an indexterm covering a unit of content. My
difficulty in seeing indexterms this way -- apart from the fact that this is not
how readers and authors would see them -- is that XML must be well formed. There
can be only one hierarchy active. On the other hand, index entries can reflect
completely orthogonal organizations. You can have index entries that
overlap/straddle each other or their parent nodes. There is no reason to
assume that an index entry range can exist within well-formed XML.
Indeed, an index range that merits its
own container may face an ontological problem: according to Microsoft's manual
of style, it should not exist. Such a sustained discussion can merit its own
topic or should otherwise belong only in the table of contents. If it is part of
the overall document structure, it probably is a candidate for the TOC, not the
index. Readers use the index for other information.
concrete example. Suppose I wrote a task on how to change my car's spark
plugs. The sequence goes something like:
- I talk
about gapping and prepping the new spark plugs here. I describe how to use the
anti-seize compound in loving detail.
- I talk
about removing the old spark plugs. I describe use of my socket extension and
then my torque wrench.
- I talk
about inserting the new spark plugs here. I caution about getting anti-seize
compound in the wrong places. I mention my socket extension
- I describe
tightening the new spark plugs using my torque wrench in excruciating
want my reader to be able to look up where auto tools are used in my new
masterpiece "Auto Misrepair for Dummies" book that incorporates this task. Using
pseudo-XML notation, the relevant index entry ranges go
the fact that these ranges completely overlap each other, they also cross the
task <step> element boundaries and child elements of <step>:
cmd, info, substeps, tutorialinfo etc. Human languages are such annoyingly
undisciplined things. That is why I felt compelled to propose page range
start/end markers outside of the XML structure.
few other things I want to cover from the discussion on page
- A page
range does not imply that the entry is the primary entry. It only implies
length. Otherwise, an entry that contains many page-range references
cannot tell us which one is primary. People sometimes indicate primary
entries by setting the page number reference in bold. My colleague uses an
entry like "XYZ, About" to similarly indicate it is primary. I did not
address the ability to indicate a primary entry in the original proposal:
is this a desirable feature apart from the page range
ranges do not merely mean multiple occurrences of the term. The Chicago Manual
of Style distinguishes between a continued discussion (e.g., 34-36) and
individual references on a sequence of pages (e.g., 34, 35, 36). The ability
to combine index entry references is not a substitute for explicit page
understand the concerns regarding topic-spanning indexterms. I would like to
point out that the current proposal disallows page range markers from starting
in one topic and ending in another. For topic spanning, it mentions using
indexterms at the map level and coalescing adjacent topics' indexterms. Would
people be comfortable with a proposal that only allows the map-level method of
spanning topics (i.e., jettisoning the latter alternative)? I'm talking about
Erik Hennum's description of using the start/end range markers in a map's
topicref's <topicmeta> element.
Fundamentally, what is an index term? In a topic architecture, I'd submit that
we should regard an index term as a semantic label attached to a unit of content
(such as a phrase, paragraph, list, table, section, topic, or collection of
topics). We should not regard an index term as attached to a point within a
discourse flow because a point doesn't have any meaning.
"This paragraph is
about application servers."
That's true regardless of where the index
term appears within the paragraph. To indicate that the index term applies only
to a sentence, the writer could wrap a <ph> element around the indexed
sentence. That is, the container of the index term defines the unit of content
that's about application servers.
PAGE RANGES. From that
perspective, we shouldn't need start and end markers for a range. By definition,
the container specifies the range for the indexed unit of content. (For an index
marker within a prolog, the effective container is the topic.)
formatter might apply the rule that, if the container spans more than one or two
pages (or some threshhold controlled by a style policy), the generated index
shows a page range. Otherwise, the formatter emits the start page for the
That way, the writer doesn't have to maintain page ranges
depending on the output. If the writer starts with an index marker on a section
but adds content to the section until it stretches to three pages (shudder), the
writer doesn't have to change the index marker to start and end markers. If the
section fits on a single page when output as 8 1/2 by 11 but flows over three
pages when output as A5 (or whatever), the writer doesn't have to revise the
topic depending on the output.
In the implementation, during the topic
merge phase, the preprocessor could insert processing instructions at the start
and end of the container if convenient for easy processing of the
If you find yourself wanting to index a range of content that's a
subset of a container, you should ask yourself whether the content merits a
container. That is, requiring that semantic units have containers is consistent
with the topic-oriented approach of assembling larger structures from small,
granular, typed units of content.
In passing, the same ambiguity that
came up for the <data> element rears its ugly head here. If I put an index
markers within a topicmeta for a topicref, should the range be the referenced
topic or the entire branch of the map? Do we need a systematic way to
distinguish the properties of the referenced topic from the properties of the
SEE vs SEE ALSO. I'm wondering if we
could produce both outputs correctly from a single element that expresses
synonyms for index terms. As I understand the publishing convention for "see"
and "see also," the correct tag depends on which terms have instances:
- If both the source term and target term for the synonym have instances,
the formatter should generate a "see also" on the source.
- If only the target term for the synonym has instances, the formatter
should generate a "see" on the source.
- If the target term for the synonym doesn't have instances, the formatter
should ignore the synonym (and potentially generate a warning).
other words, the same synonym might be a "see" or "see also" or nothing,
depending on whether the aggregating map has assembled topics that have
instances of the source and target term.
GLOBAL SORTS AND
SYNONYMS. I'd submit that it should be possible to declare sort keys and see
/ see also synonyms as global definitions rather than definitions associated
with specific instances.
After all, what if an index term has a sort key
in one instance and either no sort key or a different sort key in another
instance of the index term?
Also, when the output is generated, a see /
see also synonym applies to every instance of the index term rather than to a
specific instance. Finally, the most typical reason for defining synonyms is to
identify related content. Because the map controls the assembly of content,
synonyms would sensibly be as aspect of assembly.
Perhaps it would make
sense to define sorts and synonyms within the <keywords> element. That
way, the common case (global definitions of sorts and synonyms) is easy, the
edge case (content that requires sorts or synonyms) is awkward but possible, and
index terms embedded within content don't provide a bulky distraction from the
Maybe something like the following:
<!-- sort applied at any level
<!-- maybe an
optional attribute to enable a bidirectional synonym?
In passing, the keyref proposal (#40)
should make it possible to index the topic content but assign the labels to
those index terms in the map. Producing a good index often requires adjusting
the labels based on the labels of the other indexed content. Having to go back
into the content to align index terms is an enormous pain and an inhibiter for
reuse -- especially if you'd like to freeze the content but perform final
production on the index.
MISCELLANEOUS. I'd agree with Paul
that, with <indexterm> (as with <section>), there's an implied
structure on the content that can only be validated by XML parser when the
grammar can impose constraints on mixed content models. Regarding linking, a
generated index in HTML or PDF output should have links to the instances of the
index terms. I suppose the instances of a term could sensibly link to one
another as a convenience (if the hotspot isn't too distracting).
do you think?
Wong" <email@example.com> wrote on 09/28/2005 09:15:28 AM:
I'm kind of surprised to see no questions or objections so far to
proposal. I hear that people can have strong opinions about
subject. I'd like to see any debate get underway so we will
> have time
to move this issue forward. Anyone?
| [Thread Prev]
| [Thread Next]
| [Date Next]
| [Thread Index]
| [List Home]