dita message

Subject: RE: [dita] some index-range-* issues
From: "Grosso, Paul" <pgrosso@ptc.com>
To: <dita@lists.oasis-open.org>
Date: Wed, 9 Aug 2006 10:35:52 -0400
Chris,

Can you answer the questions I raised in my email
(under "Issues") that is at the bottom of this message?

Some more comments below.

> -----Original Message-----
> From: Chris Wong [mailto:cwong@idiominc.com] 
> Sent: Wednesday, 2006 August 09 08:41
> To: Yas Etessam; dita@lists.oasis-open.org
> Subject: RE: [dita] some index-range-* issues
> 
> Yas, you are on record as favoring the attribute proposal for
> implementation purposes. I'm wondering why the existing 
> proposal is hard
> to implement. They are all XML, after all, so internally you 
> would just
> reduce them to some ID. 
> 
> <indexterm>foo<indexterm>bar<index-range-start/></indexterm></
> indexterm>
> 
> 
> And 
> 
> <index-range-start
> subject="foo:bar">foo<indexterm>bar</indexterm></index-range-start>
> 
> Both reduce to (inventing my own pseudo-code):
> 
> index-range-start(foo:bar)

But the spec currently just talks about "paired" starts and
ends and does not define this.  You seem to be imagining a
fairly complex algorithm for how to define "matching pairs"
without specifying it.

Which of the following pairs match and how do we explain
matching to both users and implementors:

<!-- 1 -->
<indexterm>foo
  <indexterm>bar<index-range-start/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>
<indexterm>foo
  <indexterm>bar<index-range-end/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>

<!-- 2 -->
<indexterm>foo
  <indexterm>bar<index-range-start/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>
<indexterm>foo<index-range-end/>
  <indexterm>bar
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>

<!-- 3 -->
<indexterm>foo
  <indexterm>bar<index-range-start/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>
<indexterm>foo
  <indexterm>bar<index-range-end/>
    <indexterm>baz</indexterm>
  </indexterm>
</indexterm>

<!-- 4 Note &#xf6; is a numeric reference for the o-umlaut character -->
<indexterm>foö
  <indexterm>bar<index-range-start/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>
<indexterm>fo&#xf6;
  <indexterm>bar<index-range-end/>
    <indexterm>baz</indexterm>
  </indexterm>
</indexterm>

<!-- 5 -->
<indexterm>foo
  <indexterm>bar<index-range-start/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>
<indexterm>foo
  <indexterm>bar <index-range-end/>
    <indexterm>baz</indexterm>
  </indexterm>
</indexterm>

<!-- 6 -->
<indexterm>foo
  <indexterm>bar<index-range-start/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>
<indexterm>foo
  <!-- some comment -->
  <indexterm>bar<index-range-end/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>

<!-- 7 -->
<indexterm>foo
  <indexterm>bar<index-range-start/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>
<indexterm>foo
  <? some PI ?>
  <indexterm>bar<index-range-end/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>

<!-- 8 -->
<indexterm>foo
  <indexterm>bar<index-range-start/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>
<indexterm>foo
  <indexterm>bar<index-range-end/>
    <indexterm>BAZ<index-sort-as>baz</index-sort-as>
    </indexterm>
  </indexterm>
</indexterm>

<!-- 9 -->
<indexterm>foo
  <indexterm>bar<index-range-start/>
    <indexterm>baz
    </indexterm>
  </indexterm>
</indexterm>
<indexterm>foo
  <indexterm>bar<index-range-end/>
    <indexterm>baz<index-see>qrst</index-see>
    </indexterm>
  </indexterm>
</indexterm>

> 
> The test would be to make sure index-range-end(foo:bar) also 
> exists. Why
> is one syntax harder than the other? The FO plugin for the DITA Open
> Toolkit already detects mismatched index page ranges today.
> 
> The other issue that both you and Paul G are eminently qualified to
> address is the usability issue. The proposed change to index ranges is
> essentially a switch from element/content-based authoring to an
> element/@attribute approach. You essentially require that the author
> come up with an ID to assign to an attribute. Existing XML authoring
> tools simply don't make attribute editing/viewing easy. 
> Everything I've
> heard about user friendly XML authoring says to avoid authoring
> attributes (which are usually invisible), let alone coming up with IDs
> for those invisible attributes, let alone authoring matching pairs of
> those invented IDs for those invisible attributes. Won't this approach
> be unacceptably unfriendly to your users?

I've never considered attributes unfriendly.  They are
a basic part of XML.

If the decision is between having to reproduce a potentially
nested, mixed content indexterm construct possibly containing
other markup in such a fashion that it is identical in two
places except for "index-range-start" changed to "index-range-end"
(and stays identical during translation) versus authoring a NMTOKEN
attribute, I'd think the attribute is much more user friendly.

paul

> 
> Chris
> 

> Grosso, Paul wrote:
> 
> 
> 
> I'm resending this email to the list since it never made it.
> I have deleted some parts that are no longer at issue.
> I hope to follow up with another email with another proposal.
>  
> Issues
> ======
>  
> The currently proposed index-range-* elements are just
> empty "flags" that get put inside an indexterm element.  
> But it is not necessarily clear what this means in the 
> case of nested indexterms.
>  
> For example, per my best understanding, one way to indicate 
> a page range for my "pecorino" example would be markup such 
> as the following (where the comments just indicate what pages 
> each indexterm falls on):
>  
> . . .
> <!-- page 22 -->
> <indexterm>cheese
>   <indexterm>sheeps milk cheeses
>     <indexterm>pecorino<index-range-start/></indexterm>
>   </indexterm>
> </indexterm>
> . . .
> <!-- page 24 -->
> <indexterm>cheese
>   <indexterm>sheeps milk cheeses
>     <indexterm>pecorino<index-range-end/></indexterm>
>   </indexterm>
> </indexterm>
> . . .
>  
> But what if the <index-range-start/> is placed elsewhere
> in the first indexterm, such as:
>  
> <!-- page 22 -->
> <indexterm>cheese<index-range-start/>
>   <indexterm>sheeps milk cheeses
>     <indexterm>pecorino</indexterm>
>   </indexterm>
> </indexterm>
>  
> Is that equivalent, does it mean something else, or is it
> an error?  (My best guess is that it should be equivalent.)
>  
> What about the following:
>  
> <indexterm>cheese<index-range-start/></indexterm>
> . . .
> <indexterm>cheese<index-range-end/>
>   <indexterm>sheeps milk cheeses
>   </indexterm>
> </indexterm>
>  
> Since the first is an index reference for "cheese" and
> the second is one for "cheese;sheeps milk cheeses", my
> best guess is these two do not constitute a matched pair.
>  
> What about the following:
>  
> <indexterm>cheese<index-range-start/>
>   <indexterm>sheeps milk cheeses<index-range-end/>
>   </indexterm>
> </indexterm>
> . . .
> <indexterm>cheese<index-range-end/>
>   <indexterm>sheeps milk cheeses
>   </indexterm>
> </indexterm>
>  
> Is the first indexterm a range start or range end
> (or just an error)?  If it is a range start, does 
> it end immediately, or is its range-end ignored, 
> and the range is ended by the subsequent indexterm?
>  
> None of this is made clear in the current writeup.
>  
> Also, I think this is very confusing and error-prone
> for users.
>  
> Potential solution
> ==================
>  
> Rather than having empty index-range-* elements that
> magically redefine their parent to have different
> semantics, I think it would be preferable to have a 
> specialization of indexterm (or just another element) 
> that can be used to indicate the start of a range--so 
> we would write something like:
>  
> <index-range-start>cheese
>   <indexterm>sheeps milk cheeses
>     <indexterm>pecorino</indexterm>
>   </indexterm>
> </index-range-start>
>  
> to start the "cheese--sheeps milk cheeses--pecorino" range.
>  
> While in theory we could then have an analagous 
> index-range-end element with the identical nested
> indexterm content, I think that is another mistake
> in the current proposal.  The idea of creating
> matching pairs by having to have identical content
> has already been pointed out as a translation
> nightmare, but when you start to consider nested
> indexterms, it's an even worse error-prone mess, 
> both for the user and the implementors.
>  
> Instead, I would add an NMTOKEN attribute to both
> index-range-start and index-range-end, and have
> index-range-end be an empty element that just 
> refers back to the start:
>  
> <index-range-start subject="pecorino">cheese
>   <indexterm>sheeps milk cheeses
>     <indexterm>pecorino</indexterm>
>   </indexterm>
> </index-range-start>
> . . .
> <index-range-end subject="pecorino"/>
>  
> The "subject" attribute would act like a sort of
> id/idref, but I've avoided really using IDs, because
> then if you have two ranges that discuss "pecorino",
> you couldn't reuse the id="pecorino". 
>  
> paul
>   
>
References:
- RE: [dita] some index-range-* issues
  - From: "Chris Wong" <cwong@idiominc.com>