[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: From Bruce Esrig: Notes on the acronym proposal
-----Original Message----- From: Bruce Esrig [mailto:esrig@alumni.princeton.edu] Sent: Friday, June 29, 2007 3:43 AM To: JoAnn Hackos Subject: Re: Notes on the acronym proposal Hi JoAnn, Could you echo this to the list? When it's time to finalize this proposal, it might be helpful to have a wiki page that everyone can see (for example a temporary wiki page at dita.xml.org) that contains the current proposal, so that when a change is agreed upon, people can see the current proposal by using refresh in their browsers. Bruce ======== 1. //Language-by-language rules We have tried to avoided creating language-by-language rules because the processing overhead is not trivial. Kara's recommendation would require a language-by-language rule.// To explain this note more fully: Language-by-language rules arise when some languages require declension of words in the expanded form, while others do not. In languages that do not, it is feasible to show the expanded form first with the short form in parentheses. Some style guides require this. In languages that do, there is an advantage to putting the short form first, since then the language rules would (in most languages?) permit the short form not to be declined. This is good for translation because a single instance of the term can be maintained in the terminology base and used in multiple grammatical contexts. If the short form can appear second in non-declining languages and first in declining languages, then it is tempting to create a language-specific processing behavior. If the short form and expanded form are kept in separate elements, then processing can present them in an order appropriate to the language. An objection to this is that this capability would have to be implemented in all conforming DITA processing systems. There are DITA processing systems that are modifications of the DITA Open Toolkit or completely independent implementations from it. If a language-specific processing behavior is defined, it would not be sufficient to implement it only in the DITA Open Toolkit. Note, however, that the DITA Open Toolkit serves precisely this purpose in the community: to demonstrate that the requirements in the specification can be met, and to provide a reference implementation that does meet those requirements. We would want to know from other vendors how burdensome they would find a language-specific rule if it were "the right thing to do". As an alternative, a flag of some sort could be used to indicate whether to present the short form or expanded form first. The suggestion on the call, to put this flag at the element level, would be difficult to maintain since all elements would behave the same way within a given language. It's better to put the dependency at a global level, either in a flag that controls the order or in a deduction that is made automatically once the language is known. Basing the order on the language is more reliable since otherwise the flag has to be set correctly when the processing job is set up. However, the flag may be required as an override to the default for a language. The override may need to be language-specific. Some non-declining languages may need to support two orders depending on what the local style guide says about ordering in that language. Another alternative that was not on the table in the most recent discussion is to implement acronyms in only one way, with the short form first. Regarding using a combined form and extracting the pieces from it, there is still the requirement to know which order to present the pieces in. This means that there is no reason to break the XML convention of putting separate pieces of information in separate elements in the source. 2. In case it helps to look at what we're simplifying away from ... Another case that the current proposal does not support is versioning. Suppose that a terminology bank has multiple historically-accurate but time-bounded entries for a term. An example that comes to mind is described in http://en.wikipedia.org/wiki/Timeline_of_AIDS, namely: "1986: HIV (human immunodeficiency virus) is adopted as name of the retrovirus that was first proposed as the cause of AIDS by Luc Montagnier of France, who named it LAV (lymphadenopathy associated virus) and Robert Gallo of the United States, who named it HTLV-III (human T-lymphotropic virus type III) ". If we wished to record a relationship among these terms in the source in DITA, we would need two IDs for the term: one for the surface form and one for the meaning. The meaning is the term bank entry that recognizes the connection among the alternate surface forms. This could be done by treating the ID in the term as a reference to the surface form and providing the ID for the meaning, when required, within a <data> element nested within the outermost <acronym> element. A passage that referred to multiple terms would do so by using each surface form as needed, and indicating the connection among them using the <data> element. The DITA markup for this passage would be: <p>1986</p> <ul><li> <acronym id="hiv-current"> <data name="meaning" value="hiv"> <short>HIV</short><expanded>human immunodeficiency virus</expanded> </acronym> is adopted as name of the retrovirus that was first proposed as the cause of AIDS by Luc Montagnier of France, who named it <acronym id="hiv-montagnier"> <data name="meaning" value="hiv"> <short>LAV</short><expanded>lymphadenopathy associated virus</expanded> </acronym> and Robert Gallo of the United States, who named it <acronym id="hiv-gallo"> <data name="meaning" value="hiv"> <short>HTLV-III</short><expanded>human T-lymphotropic virus type III</expanded> </acronym>. </li></ul> According to this markup, <data> would need to be permitted within <keyword> since <acronym> is proposed as a specialization of <keyword>. 3. We may need to do more work to unify the term-like elements in DITA. As of February 2007, the DITA 1.1. Architecture guide treats "metadata" separately, but seems to lack a thorough statement on terminology. <keyword>, <indexterm>, and <term> have related behaviors, and may need to be managed in parallel. Applying this question to the <acronym> proposal ... As in the case of the <keyword> element, the <data> element would probably need to be supported within <term> and most likely <indexterm>. At 12:54 PM 6/27/2007, you wrote: >Hello Friends, >I've added more notes to Gershon's meeting minutes. Andrzej and Rodolfo >in particular, please review the notes. I've also asked Kara W to >review the entire proposal since she had not yet read it. > >We have two proposals in the notes that we need to consider next week. >Each >Involves adding a third element to our plan. > >Kara's primary concern seems to be with the post-processing for term >extraction. > >I also wonder if that processing could not be revised to account for >the acronym in the expanded form rather than adding complexity to this >proposal. > >JoAnn > >JoAnn T. Hackos, PhD >President >Comtech Services, Inc. >710 Kipling Street, Suite 400 >Denver, CO 80215 >303-232-7586 >joann.hackos@comtech-serv.com >joannhackos Skype >www.comtech-serv.com > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]