dita message

Subject: Re: [dita] Practical question - indexing elements in the language reference
From: Richard Hamilton <hamilton@xmlpress.net>
To: dita <dita@lists.oasis-open.org>
Date: Fri, 10 Jul 2015 15:39:07 -0700
Regarding the discussion on translation from a few messages back in this thread, I exchanged email with an expert in translation management systems and learned a few things that may help if the spec is ever translated.  We didn't discuss specific translation management (TM) systems, but I think most of the widely used systems would support the strategies described below:

1) TM systems can be adjusted to show particular tags to the translator, so the text inside an index term can be clearly identified as being an index term and not part of running text. That would make it easier for a translator to deal with text like <indexterm>Shakespeare, William<indexterm>the works of</indexterm></indexterm>, which would otherwise be ungrammatical, especially if embedded in the middle of a sentence.

2) If you have two sentences that are identical, except one has an embedded index term, TM systems may not see them as a match and, therefore, might treat them as two distinct sentences that would have to be translated separately. To combat this, it is best, when possible, to put an embedded index term immediately before or after the sentence the term appears in. I think in most cases, doing this won't cause any problems.

I think the bottom line is that embedding index terms is manageable in TM systems with a bit of planning.

Best regards,
Dick
-------
XML Press
XML for Technical Communicators
http://xmlpress.net
hamilton@xmlpress.net



On Jul 10, 2015, at 13:04, Kristen James Eberlein <kris@eberleinconsulting.com> wrote:

> Agree 100%. Indexing the element topics would be akin to indexing items in a dictionary -- Useless, especially given that we provide a topic that lists all elements.
> 
> I agree with Robert that the following items need indexing:
> 	• Attributes
> 	• Processing expectations
> I'd also add best practices ...
> Best,
> Kris
> 
> Kristen James Eberlein
> Chair, OASIS DITA Technical Committee
> Principal consultant, Eberlein Consulting
> www.eberleinconsulting.com
> +1 919 682-2290; kriseberlein (skype)
> 
> On 7/9/2015 1:29 PM, Robert D Anderson wrote:
>> On this week's call, I mentioned the poor quality of index entries in the language reference. Elements are indexed multiple ways, with no consistent design. The all inclusive package has just over 600 elements; those topics (plus the containers) have over 2600 primary, secondary, or tertiary <indexterm> elements.
>> 
>> All of the following are used today:
>> * Primary entry with the element name (most common but not universal)
>> * Primary entry with natural language (add "abbreviation list" to "abbrevlist")
>> * Element name as secondary entry under a domain or module (we have primary "highlighting domain" with one secondary entry for each element)
>> * Primary entry based on purpose (we have image, alt, and longdescref all indexed with "images")
>> * Various other methods (<shortdesc> has entries under the primary terms "topics", "maps", "elements", "examples", "processing expectations", and "short descriptions") 
>> 
>> In thinking about which of these are useful, I remembered Eliot's comment yesterday:
>> > Of course, in the ideal index, most of the terms are *not* in the titles,
>> > since part of the point of an index is to relate non-obvious things to
>> > their locations in the doc.
>> 
>> Every element in the langRef uses the element name as the title. Is it useful to index the element name, exactly as it appears in the TOC?
>> 
>> Personally I only use the index to look up architectural concepts or features. For element names, I use the TOC. In the all-inclusive package today, the conceptual terms are spread among 600+ element names and hundreds more near-identical primary entries. We could clean up primary entries by indexing elements only under the domain / module name, but that is only helpful if you already know how we group elements. It also raises the same question - is it useful for the index to reproduce exactly the same grouping already found in the TOC?
>> 
>> With all that in mind, I lean towards a default policy of not indexing every element topic. I suggest this as a general policy, for topics that define a single element, not an absolute rule about primary entries in the langRef. Attribute topics in the langRef need indexing. Elements with special processing expectations need indexing. Other groupings will be useful (maybe a primary entry grouping all deprecated elements). Other exceptions are expected. 
>> 
>> Thoughts? I expect there will be at least a few on this one...
>> 
>> Robert D Anderson
>> IBM Authoring Tools Development
>> Chief Architect, DITA Open Toolkit (http://www.dita-ot.org/)
> 
> --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
References:
- Practical question - indexing elements in the language reference
  - From: "Robert D Anderson" <robander@us.ibm.com>
- Re: [dita] Practical question - indexing elements in the language reference
  - From: Kristen James Eberlein <kris@eberleinconsulting.com>