dita message

Subject: RE: [dita] some index-range-* issues
From: "Yas Etessam" <yas.etessam@xmetal.com>
To: "Chris Wong" <cwong@idiominc.com>,<dita@lists.oasis-open.org>
Date: Wed, 9 Aug 2006 11:50:08 -0700
One is less error-prone than the other.

An explicit reference avoids issues around white space, capitalization,
diacritics, misspelled words and accidental mismatches. 

Whether one uses a commercial XML editor or Vi for authoring the tags,
there's a smaller margin for error.   It becomes easier to identify
orphans and there's less maintenance if a user decides to change the
content of the start index tag, they won't have to go and change the
content in the end range tag.

I agree that both reduce to the same constructs and that there will be
subtle differences depending on whether a user has to manually type in
an attribute value for the index range in Vi or whether a tool is smart
enough to auto generate that attribute value.  

- Yas

-----Original Message-----
From: Chris Wong [mailto:cwong@idiominc.com] 
Sent: Wednesday, August 09, 2006 6:41 AM
To: Yas Etessam; dita@lists.oasis-open.org
Subject: RE: [dita] some index-range-* issues

Yas, you are on record as favoring the attribute proposal for
implementation purposes. I'm wondering why the existing proposal is hard
to implement. They are all XML, after all, so internally you would just
reduce them to some ID. 

<indexterm>foo<indexterm>bar<index-range-start/></indexterm></indexterm>


And 

<index-range-start
subject="foo:bar">foo<indexterm>bar</indexterm></index-range-start>

Both reduce to (inventing my own pseudo-code):

index-range-start(foo:bar)

The test would be to make sure index-range-end(foo:bar) also exists. Why
is one syntax harder than the other? The FO plugin for the DITA Open
Toolkit already detects mismatched index page ranges today.

The other issue that both you and Paul G are eminently qualified to
address is the usability issue. The proposed change to index ranges is
essentially a switch from element/content-based authoring to an
element/@attribute approach. You essentially require that the author
come up with an ID to assign to an attribute. Existing XML authoring
tools simply don't make attribute editing/viewing easy. Everything I've
heard about user friendly XML authoring says to avoid authoring
attributes (which are usually invisible), let alone coming up with IDs
for those invisible attributes, let alone authoring matching pairs of
those invented IDs for those invisible attributes. Won't this approach
be unacceptably unfriendly to your users?

Chris


-----Original Message-----
From: Yas Etessam [mailto:yas.etessam@xmetal.com]
Sent: Tuesday, August 08, 2006 9:26 PM
To: dita@lists.oasis-open.org
Subject: RE: [dita] some index-range-* issues

We can avoid cross-topic ranges by having some guidelines that users
shouldn't be creating orphaned range tags into their topics. Paul G's

new markup proposal will create XML with enough information that
implementors could warn users when they've got orphaned tags. Even
though the DTD can't prevent the orphaned tags, XML editors could
theoretically warn users about that scenario.


In terms of what we want to support, it doesn't make much sense to
modify DITA to accommodate a 'poor indexing practice' if the best
practice is to simply show the start page. In terms of technology, it
wouldn't be difficult to add some type of "range" attribute on an
indexterm within topicmeta to indicate to some output process to include
both the start/end pages but the real question is "should we"? JoAnn
seems to be suggesting that we should try to encode best practices
within our data model as opposed to being overly accommodating.

- Yas Etessam


________________________________

From: JoAnn Hackos [mailto:joann.hackos@comtech-serv.com]
Sent: Tuesday, August 08, 2006 2:47 PM
To: Dana Spradley; Grosso, Paul
Cc: dita@lists.oasis-open.org
Subject: RE: [dita] some index-range-* issues



Dana,

You're echoing my thoughts, reflected in this earlier memo.

 

Perhaps the additional confusion here is moving across topics. It would
seem better to avoid cross-topic indexing ranges completely. Would that
still be an option? 

 

I think the current state of the proposals in both cases tries  to
accommodate poor indexing practices that ignore the usability of an
index for actual readers. The simplest method is to give the page number
only for the first page of a longer item, letting the reader decide when
he has had enough. Some indexers use ff (folios or numbers of pages)to
indicate a longer discussion, beginning on a page, such as 356ff with
the ff in italic. There is something problematic, it seems, to have page
ranges that span topics, given our case for the standalone nature of a
topic. 

 

Anyway -- just a few thoughts on the philosophy behind the technical
debate.

JoAnn

 

 

JoAnn T. Hackos, PhD
President
Comtech Services, Inc.
710 Kipling Street, Suite 400
Denver, CO 80215
303-232-7586
joann.hackos@comtech-serv.com <mailto:joann.hackos@comtech-serv.com>
joannhackos Skype

www.comtech-serv.com

________________________________

From: Dana Spradley [mailto:dana.spradley@oracle.com]
Sent: Tuesday, August 08, 2006 12:34 PM
To: Grosso, Paul
Cc: dita@lists.oasis-open.org
Subject: Re: [dita] some index-range-* issues

 

Even after this morning's discussion, I like Paul's idea - although I
personally wouldn't allow mixed content in index-term-start, but would
wrap the top-level indexterm in an indexterm element.

On the other hand, what is an index range supposed to mean when you come
across one in an index? 

I always thought it meant that's where an extended discussion of that
topic occurs in the book.

DITA being a topic-oriented architecture, it would seem more appropriate
to put indexterms that apply to the entire topic somewhere in the
metadata for that topic - and only construct index ranges for those.

--Dana

Grosso, Paul wrote:



I'm resending this email to the list since it never made it.
I have deleted some parts that are no longer at issue.
I hope to follow up with another email with another proposal.
 
Issues
======
 
The currently proposed index-range-* elements are just empty "flags"
that get put inside an indexterm element.  
But it is not necessarily clear what this means in the case of nested
indexterms.
 
For example, per my best understanding, one way to indicate a page range
for my "pecorino" example would be markup such as the following (where
the comments just indicate what pages each indexterm falls on):
 
. . .
<!-- page 22 -->
<indexterm>cheese
  <indexterm>sheeps milk cheeses
    <indexterm>pecorino<index-range-start/></indexterm>
  </indexterm>
</indexterm>
. . .
<!-- page 24 -->
<indexterm>cheese
  <indexterm>sheeps milk cheeses
    <indexterm>pecorino<index-range-end/></indexterm>
  </indexterm>
</indexterm>
. . .
 
But what if the <index-range-start/> is placed elsewhere in the first
indexterm, such as:
 
<!-- page 22 -->
<indexterm>cheese<index-range-start/>
  <indexterm>sheeps milk cheeses
    <indexterm>pecorino</indexterm>
  </indexterm>
</indexterm>
 
Is that equivalent, does it mean something else, or is it an error?  (My
best guess is that it should be equivalent.)
 
What about the following:
 
<indexterm>cheese<index-range-start/></indexterm>
. . .
<indexterm>cheese<index-range-end/>
  <indexterm>sheeps milk cheeses
  </indexterm>
</indexterm>
 
Since the first is an index reference for "cheese" and the second is one
for "cheese;sheeps milk cheeses", my best guess is these two do not
constitute a matched pair.
 
What about the following:
 
<indexterm>cheese<index-range-start/>
  <indexterm>sheeps milk cheeses<index-range-end/>
  </indexterm>
</indexterm>
. . .
<indexterm>cheese<index-range-end/>
  <indexterm>sheeps milk cheeses
  </indexterm>
</indexterm>
 
Is the first indexterm a range start or range end (or just an error)?
If it is a range start, does it end immediately, or is its range-end
ignored, and the range is ended by the subsequent indexterm?
 
None of this is made clear in the current writeup.
 
Also, I think this is very confusing and error-prone for users.
 
Potential solution
==================
 
Rather than having empty index-range-* elements that magically redefine
their parent to have different semantics, I think it would be preferable
to have a specialization of indexterm (or just another element) that can
be used to indicate the start of a range--so we would write something
like:
 
<index-range-start>cheese
  <indexterm>sheeps milk cheeses
    <indexterm>pecorino</indexterm>
  </indexterm>
</index-range-start>
 
to start the "cheese--sheeps milk cheeses--pecorino" range.
 
While in theory we could then have an analagous index-range-end element
with the identical nested indexterm content, I think that is another
mistake in the current proposal.  The idea of creating matching pairs by
having to have identical content has already been pointed out as a
translation nightmare, but when you start to consider nested indexterms,
it's an even worse error-prone mess, both for the user and the
implementors.
 
Instead, I would add an NMTOKEN attribute to both index-range-start and
index-range-end, and have index-range-end be an empty element that just
refers back to the start:
 
<index-range-start subject="pecorino">cheese
  <indexterm>sheeps milk cheeses
    <indexterm>pecorino</indexterm>
  </indexterm>
</index-range-start>
. . .
<index-range-end subject="pecorino"/>
 
The "subject" attribute would act like a sort of id/idref, but I've
avoided really using IDs, because then if you have two ranges that
discuss "pecorino", you couldn't reuse the id="pecorino". 
 
paul