OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [dita] Question about how to define equivalent index entries


I agree that we have to allow processor to do whatever they want with index process but I think it's also reasonable to specify an expected set of behaviors for "normal" index processing.

I would have those include:

- Index terms are compared by normalizing white space and preserving case
- The use of <sort-as> does not affect the merging of index entries for index entries that have the same sort-as value. This is because the sort-as value should be *combined* with the base index term text to construct the complete sort key.
- Two index terms with the same base text and different sort-as values must be an error and the processor can recover as it chooses. That is, it cannot be sensible to have the same base term presented in two different places in the index, so it must be author error. Either they used the incorrect sort key or they mean to use a different base term with an index-see
- The merging of index entries where one entry is only a primary term and others are the primary term with secondary terms is processor dependent and processors should be encouraged to provide options for how to handle this case: separate entries, always merge, report a warning. Which behavior you want is an editorial choice.

In addition, there is the question of whether or not primary entries with secondary entries should be give page numbers or not. This is again an editorial choice that can be controlled either by authoring practice (never have primary-only entries for a term that also has secondary entries) or can be enforced by the processor with exceptions reported as warnings. As an example, Mike Kay's XSLT book's index has page numbers for primary entries that also have secondary entries but the SGML Handbook does not.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 

ïOn 8/12/19, 2:08 PM, "Robert D Anderson" <dita@lists.oasis-open.org on behalf of robander@us.ibm.com> wrote:

    Eliot raised a point that I think needs wider TC input during his review of the DITA 2.0 indexing content.
    
    Our examples of index entries show how one primary term with two secondary entries are considered equivalent to the same primary term defined twice (once with each secondary term). See figure 2 for the same example in our DITA 1.3 spec:
    http://docs.oasis-open.org/dita/dita/v1.3/errata02/os/complete/part1-base/langRef/base/indexterm.html#indexterm
    
    Eliot pointed out that this reflects some assumptions about how processors must merge index entries, but those rules are never stated. So the question: how precise should the spec be about merging terms?
    
    I ask because in the end, this is really all about rendering -- and processors are free to render an index in all sorts of ways. For example, if I have <indexterm>oops</indexterm> in fifteen topics, I would expect most processors to render that as one index term with 15 links. That said, it would technically be valid for a processor to have fifteen entries for "oops". I don't think we can or should forbid that.
    
    With that in mind - how precise should the specification be when it comes to merging index terms?
    
    For example - how many of these should be rules in the spec? How many should be addressed but explicitly left up to implementations? How many should not be addressed at all?
    
    * Are "oops" and "Oops" equivalent? I would think not, so we can probably say that case sensitivity is important.
    * What if one has a leading or trailing space, and the other does not - is that significant?
    * What if the text content is the same, but one has non-indexterm sub-elements? For example:
    <indexterm>This is odd</indexterm>
    and
    <indexterm>This is <em>odd</em></indexterm>
    * What if one has a secondary term in the middle, and another has it at the end? For example, should we explicitly state that these primary terms are equivalent?
    <indexterm>This is <indexterm>secondary</indexterm> interesting</indexterm>
    and
    <indexterm>This is interesting<indexterm>secondary</indexterm></indexterm>
    * What if two terms have the same sort key? For example, would these all match?
    <indexterm>data</indexterm>
    <indexterm>data<sort-as>data</sort-as></indexterm>
    <indexterm>Data<sort-as>data</sort-as></indexterm>
    
    
    I'm sure there are a lot more edge cases, so that list above is really just to give a taste of the different things we might have to get into if we are exhaustive about "matching".
    Robert D. Anderson
    DITA-OT <https://dita-ot.org/> lead and Co-editor DITA 1.3 specification
    Marketing Services Center________________________________________
    E-mail: robander@us.ibm.com
    
    11501 BURNET RD,, TX, 78758-3400, AUSTIN, USA
    
    
    
    




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]