OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [dita] Question about how to define equivalent index entries


My quick thoughts:

  • Are "oops" and "Oops" equivalent? I would think not, so we can probably say that case sensitivity is important. [CN] Yes
  • What if one has a leading or trailing space, and the other does not - is that significant? [CN] No, it's not significant. Leading/trailing whitespace should be trimmed.
  • What if the text content is the same, but one has non-indexterm sub-elements? For example:
    <indexterm>This is odd</indexterm>
    and
    <indexterm>This is <em>odd</em></indexterm> [CN] Implementation-dependent
  • What if one has a secondary term in the middle, and another has it at the end? For example, should we explicitly state that these primary terms are equivalent?
    <indexterm>This is <indexterm>secondary</indexterm> interesting</indexterm>
    and
    <indexterm>This is interesting<indexterm>secondary</indexterm></indexterm> [CN] Implementation (and maybe customer/document)-dependent
  • What if two terms have the same sort key? For example, would these all match?
    <indexterm>data</indexterm>
    <indexterm>data<sort-as>data</sort-as></indexterm>
    <indexterm>Data<sort-as>data</sort-as></indexterm> [CN] Implementation-dependent. I'm more concerned about the same term with different sort-as instructions, which again, should probably be implementation-dependent.



Chris

 

From: <dita@lists.oasis-open.org> on behalf of Robert D Anderson <robander@us.ibm.com>
Date: Monday, August 12, 2019 at 3:08 PM
To: "dita@lists.oasis-open.org" <dita@lists.oasis-open.org>
Subject: [dita] Question about how to define equivalent index entries

 

Eliot raised a point that I think needs wider TC input during his review of the DITA 2.0 indexing content.

Our examples of index entries show how one primary term with two secondary entries are considered equivalent to the same primary term defined twice (once with each secondary term). See figure 2 for the same example in our DITA 1.3 spec:
http://docs.oasis-open.org/dita/dita/v1.3/errata02/os/complete/part1-base/langRef/base/indexterm.html#indexterm

Eliot pointed out that this reflects some assumptions about how processors must merge index entries, but those rules are never stated. So the question: how precise should the spec be about merging terms?

I ask because in the end, this is really all about rendering -- and processors are free to render an index in all sorts of ways. For example, if I have <indexterm>oops</indexterm> in fifteen topics, I would expect most processors to render that as one index term with 15 links. That said, it would technically be valid for a processor to have fifteen entries for "oops". I don't think we can or should forbid that.

With that in mind - how precise should the specification be when it comes to merging index terms?

For example - how many of these should be rules in the spec? How many should be addressed but explicitly left up to implementations? How many should not be addressed at all?

  • Are "oops" and "Oops" equivalent? I would think not, so we can probably say that case sensitivity is important.
  • What if one has a leading or trailing space, and the other does not - is that significant?
  • What if the text content is the same, but one has non-indexterm sub-elements? For example:
    <indexterm>This is odd</indexterm>
    and
    <indexterm>This is <em>odd</em></indexterm>
  • What if one has a secondary term in the middle, and another has it at the end? For example, should we explicitly state that these primary terms are equivalent?
    <indexterm>This is <indexterm>secondary</indexterm> interesting</indexterm>
    and
    <indexterm>This is interesting<indexterm>secondary</indexterm></indexterm>
  • What if two terms have the same sort key? For example, would these all match?
    <indexterm>data</indexterm>
    <indexterm>data<sort-as>data</sort-as></indexterm>
    <indexterm>Data<sort-as>data</sort-as></indexterm>


I'm sure there are a lot more edge cases, so that list above is really just to give a taste of the different things we might have to get into if we are exhaustive about "matching".

Robert D. Anderson
DITA-OT lead and Co-editor DITA 1.3 specification
Marketing Services Center

 


E-mail: robander@us.ibm.com

11501 BURNET RD,, TX, 78758-3400, AUSTIN, USA

IBM




The content of this email and any attached files are intended for the recipient specified in this message only. It may contain information that is confidential, proprietary, privileged, and/or exempt from disclosure under applicable law. It is strictly forbidden to share any part of this message with any third party or rely on any of its contents, without the written consent of the sender. If you received this message by mistake, please reply to this message and follow with deletion of the original message, any copies and all attachments, so that Oberon Technologies can ensure such a mistake does not occur in the future.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]