OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

dita message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [dita] Groups - DITA 1.1 Issue #45: Add See, See Also indexing elements (IssueNumber45.html) uploaded

Hi Paul and Chris,
I have been writing this the last few days. I was going to add more but
I'll send it out now.


Indexing use case:

Basic index structure:

Primary term
	Secondary term
		Tertiary term

A basic index requires three levels of terms: Primary, secondary,
tertiary. Some indexes may consist of more than three levels but that is
not recommended as a best practice.

"see" index structure:

A "see" index reference is designed to refer the reader for the
controlled vocabulary term used in the text. Typically the index term is
a synonym or is otherwise equivalent to the controlled vocabulary term.
The index typically does not list a page number for the synonym but
refers the reader to the controlled term for the correct page number.

"see also" index structure:

A "see also" index reference is designed to suggest an additional
controlled term in relationship to the target controlled term. The
target controlled term does include a page reference. Typically you
don't mix see and see also structures. The see also reference should
occur with the target index term rather than with the synonym.

"page range" index structure:

Indexers use page ranges to indicate that an important, high-level topic
is covered over a number of pages. Page ranges are applicable to books
rather than HTML or help systems that refer to topics rather than
sections of books. This requirements may be difficult to implement
through a range of topics in a map or a bookmap.

"index sort":
Index sort sequences may vary and cause problems with translations. Many
indexes in languages other than English tend to be incorrectly sorted
because of characters that do not occur in English. The tendency is to
misplace these characters at the end of the sort rather than where they
belong in the minds of the readers of the target language.

"index term linking"

We can look at this in two ways, which reflect best practices in some of
the more sophisticated index tools. First, as an index is being edited,
a best practice is to link the index term and a single page number back
to the actual index term embedded in the text. The reason is to find and
correct the index term (spelling, change level, etc). Second, for an
automated index, you want to be able to go from the index term in the
final rendering to the page in which the indexed content occurs. In help
indexes, that index items go to the topic level but in PDF indexes, the
link should go to the paragraph level or as close to the actual index
term placement as possible.

JoAnn T. Hackos, PhD
Comtech Services, Inc.
710 Kipling Street, Suite 400
Denver CO 80215

-----Original Message-----
From: Grosso, Paul [mailto:pgrosso@ptc.com] 
Sent: Wednesday, September 28, 2005 12:08 PM
To: dita@lists.oasis-open.org
Subject: RE: [dita] Groups - DITA 1.1 Issue #45: Add See, See Also
indexing elements (IssueNumber45.html) uploaded

> -----Original Message-----
> From: Chris Wong [mailto:cwong@idiominc.com] 
> Sent: Wednesday, 2005 September 28 11:15
> To: dita@lists.oasis-open.org
> Subject: RE: [dita] Groups - DITA 1.1 Issue #45: Add See, See 
> Also indexing elements (IssueNumber45.html) uploaded
> I'm kind of surprised to see no questions or objections so 
> far to this proposal. I hear that people can have strong 
> opinions about this subject. I'd like to see any debate get 
> underway so we will have time to move this issue forward. Anyone?
> Download Document:  

There is something about indexterm (irrespective of
this current proposal) that has always concerned me:
its mixed content model.  Is something like:

<indexterm>Top level
  index term content.

allowed (the DTD allows it)?  If so, what are the 
processing expectations?

Also, what are the processing expectations of

<indexterm>Top level
  <indexterm>Nested 1</indexterm>
  <indexterm>Nested 2</indexterm>

(the DTD allows this too)?

More on this particular proposal

What is the suggested content model now for indexterm?
Indexterm already had a mixed content model, but now it
seems even "more mixed" (if such is possible).  Can one
have #PCDATA following <index-sort-as>...</index-sort-as>? 
If there is going to be an index-sort-as, will it always
be the first child element of the indexterm element?

Is one limited to at most one index-see or index-see-also?
If one has an index-see, can one have an index-see-also?
Is the semantic that if one has an index-see, one doesn't
show the page number on the parent indexterm, but otherwise
one does?

We currently have the following content model:

<!ELEMENT indexterm     (%words.cnt;|%indexterm;)*    >

I'm guessing we might want a content model something like:

<!ELEMENT indexterm     ((%words.cnt;)?,
         (index-see | index-see-also+)? , indexterm?) >

except you can't do that in XML, so we're probably going
to have to allow just a big mash of text and tags, and
write "application semantics" that say it's only dita-valid
if it matches the above non-XML content model.  Regardless,
the proposal needs to describe what is valid input and how
to handle all possible input.

The entire discussion of "linking to other indexterms"
confuses me.  I don't see any linking to indexterms.
There are just indexterms scattered throughout the content,
and when the index is automatically generated, entries
therein pick up the appropriate page numbers and possibly
link to the point in the result where the indexterm element
was found, but there are no links to the indexterms.
Perhaps it's just the wording that confuses me, but it
makes no sense to me to say, for example:

  ...the reference to "Goldfish feeding" points to a
  nested indexterm.  We need to define an identifier
  that a redirection element such as index-see can use
  to point to something yet to be generated. //I don't understand this
either, JoAnn//

Page ranges make me nervous.  They are difficult
to implement correctly, and they are easy to use
incorrectly.  Especially given that <index-range-start/>
and <index-range-end/> are unpaired singleton tags,
it's easy for a user to use them in ways that aren't
going to be valid.

I'm not sure what user requirement is being addressed
by ranges.  Is it just to be able to get something like
46-49 in the index, or is it to allow a user to just
indicate a startpoint and endpoint in the source without
having to insert individual indexterm elements on each page?
The former is just an implementation issue and shouldn't
drive our markup, but I can see the point of the latter.
But we do have to ask, then, if the benefit of this is
enough to offset the problems.
//this second case is not one I've ever seen in any markup for indexes,

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]