[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [dita-translation] From Bruce Esrig: Notes on the acronym proposal
Hi Bruce, JoAnn, and others, I am a little confused by one of the points here: > If the short form can appear second in non-declining languages and first > in declining languages, then it is tempting to create a > language-specific processing behavior. If the short form and expanded > form are kept in separate elements, then processing can present them in > an order appropriate to the language. > > An objection to this is that this capability would have to be > implemented in all conforming DITA processing systems. There are DITA > processing systems that are modifications of the DITA Open Toolkit or > completely independent implementations from it. If a language-specific > processing behavior is defined, it would not be sufficient to implement > it only in the DITA Open Toolkit. I worry that we are trying to design the language such that no processor has to have any knowledge of the <acronym> element. While this would make my own life easier with regards to the toolkit, it feels like the wrong thing to do. In general, processors expect to implement something as part of supporting a new element. Conforming DITA processors do this all the time. Some elements are easy, such as <b>, which only requires a bit of highlighting. Others are more difficult, such as <properties>, which is expected to display as a table with localized default headings. Creating a localized rule for acronyms seems similar to supporting an element that requires a new localized string. Here is my understanding, with regards to an acronym's first occurrence: * Some languages use the long form first * Some languages use the short form first, to get around declension or capitalization problems * It is not the specification's role to mandate which language does what; that is left up to renderers. * Processors necessarily support a limited number of languages (for generated text, display direction, etc) * A processor should be able to discover the general acronym preference for each language that it already supports. This is done once, before the product release. * A processor MAY allow users to override the setting for one or all languages * Difficulties only arise when a language needs a mix of short-first and long-first. We at the Translation SC must then determine: 1. For those languages, is it still appropriate to specify a general rule - 95% do one thing, with some exceptions? We would seem to need a way to mark those exceptions. 2. Is there ever a case where no rule can be specified - 50% do one, and 50% do another? This would be more difficult to accommodate in markup. As someone who works on a DITA processing system, creating a rule for each language that I already support seems pretty straight-forward. I do not think that the toolkit is different than other processing systems, which makes me wonder about this objection. Am I missing something in my summary? My understanding of acronym issues is based mostly on the discussions we've had in this group, so is there more to this than I realize? Thanks - Robert D Anderson IBM Authoring Tools Development Chief Architect, DITA Open Toolkit (507) 253-8787, T/L 553-8787 (Good Monday & Thursday) "JoAnn Hackos" <joann.hackos@com tech-serv.com> To <dita-translation@lists.oasis-open. 06/29/2007 08:08 org>, <mambrose@sdl.com>, AM <bhertz@sdl.com>, "Bryan Schnabel" <bryan.s.schnabel@tek.com>, Charles Pau/Cambridge/IBM@Lotus, <christian.lieske@sap.com>, <dpooley@sdl.com>, Dave A Schell/Raleigh/IBM@IBMUS, <esrig-ia@esrig.com>, <fsasaki@w3.org>, <rfletcher@sdl.com>, "Howard.Schwartz" <Howard.Schwartz@trados.com>, <ishida@w3.org>, <tony.jewtushenko@productinnovator. com>, <KARA@CA.IBM.COM>, <ysavourel@translate.com> cc Subject [dita-translation] From Bruce Esrig: Notes on the acronym proposal -----Original Message----- From: Bruce Esrig [mailto:esrig@alumni.princeton.edu] Sent: Friday, June 29, 2007 3:43 AM To: JoAnn Hackos Subject: Re: Notes on the acronym proposal Hi JoAnn, Could you echo this to the list? When it's time to finalize this proposal, it might be helpful to have a wiki page that everyone can see (for example a temporary wiki page at dita.xml.org) that contains the current proposal, so that when a change is agreed upon, people can see the current proposal by using refresh in their browsers. Bruce ======== 1. //Language-by-language rules We have tried to avoided creating language-by-language rules because the processing overhead is not trivial. Kara's recommendation would require a language-by-language rule.// To explain this note more fully: Language-by-language rules arise when some languages require declension of words in the expanded form, while others do not. In languages that do not, it is feasible to show the expanded form first with the short form in parentheses. Some style guides require this. In languages that do, there is an advantage to putting the short form first, since then the language rules would (in most languages?) permit the short form not to be declined. This is good for translation because a single instance of the term can be maintained in the terminology base and used in multiple grammatical contexts. If the short form can appear second in non-declining languages and first in declining languages, then it is tempting to create a language-specific processing behavior. If the short form and expanded form are kept in separate elements, then processing can present them in an order appropriate to the language. An objection to this is that this capability would have to be implemented in all conforming DITA processing systems. There are DITA processing systems that are modifications of the DITA Open Toolkit or completely independent implementations from it. If a language-specific processing behavior is defined, it would not be sufficient to implement it only in the DITA Open Toolkit. Note, however, that the DITA Open Toolkit serves precisely this purpose in the community: to demonstrate that the requirements in the specification can be met, and to provide a reference implementation that does meet those requirements. We would want to know from other vendors how burdensome they would find a language-specific rule if it were "the right thing to do". As an alternative, a flag of some sort could be used to indicate whether to present the short form or expanded form first. The suggestion on the call, to put this flag at the element level, would be difficult to maintain since all elements would behave the same way within a given language. It's better to put the dependency at a global level, either in a flag that controls the order or in a deduction that is made automatically once the language is known. Basing the order on the language is more reliable since otherwise the flag has to be set correctly when the processing job is set up. However, the flag may be required as an override to the default for a language. The override may need to be language-specific. Some non-declining languages may need to support two orders depending on what the local style guide says about ordering in that language. Another alternative that was not on the table in the most recent discussion is to implement acronyms in only one way, with the short form first. Regarding using a combined form and extracting the pieces from it, there is still the requirement to know which order to present the pieces in. This means that there is no reason to break the XML convention of putting separate pieces of information in separate elements in the source. 2. In case it helps to look at what we're simplifying away from ... Another case that the current proposal does not support is versioning. Suppose that a terminology bank has multiple historically-accurate but time-bounded entries for a term. An example that comes to mind is described in http://en.wikipedia.org/wiki/Timeline_of_AIDS, namely: "1986: HIV (human immunodeficiency virus) is adopted as name of the retrovirus that was first proposed as the cause of AIDS by Luc Montagnier of France, who named it LAV (lymphadenopathy associated virus) and Robert Gallo of the United States, who named it HTLV-III (human T-lymphotropic virus type III) ". If we wished to record a relationship among these terms in the source in DITA, we would need two IDs for the term: one for the surface form and one for the meaning. The meaning is the term bank entry that recognizes the connection among the alternate surface forms. This could be done by treating the ID in the term as a reference to the surface form and providing the ID for the meaning, when required, within a <data> element nested within the outermost <acronym> element. A passage that referred to multiple terms would do so by using each surface form as needed, and indicating the connection among them using the <data> element. The DITA markup for this passage would be: <p>1986</p> <ul><li> <acronym id="hiv-current"> <data name="meaning" value="hiv"> <short>HIV</short><expanded>human immunodeficiency virus</expanded> </acronym> is adopted as name of the retrovirus that was first proposed as the cause of AIDS by Luc Montagnier of France, who named it <acronym id="hiv-montagnier"> <data name="meaning" value="hiv"> <short>LAV</short><expanded>lymphadenopathy associated virus</expanded> </acronym> and Robert Gallo of the United States, who named it <acronym id="hiv-gallo"> <data name="meaning" value="hiv"> <short>HTLV-III</short><expanded>human T-lymphotropic virus type III</expanded> </acronym>. </li></ul> According to this markup, <data> would need to be permitted within <keyword> since <acronym> is proposed as a specialization of <keyword>. 3. We may need to do more work to unify the term-like elements in DITA. As of February 2007, the DITA 1.1. Architecture guide treats "metadata" separately, but seems to lack a thorough statement on terminology. <keyword>, <indexterm>, and <term> have related behaviors, and may need to be managed in parallel. Applying this question to the <acronym> proposal ... As in the case of the <keyword> element, the <data> element would probably need to be supported within <term> and most likely <indexterm>. At 12:54 PM 6/27/2007, you wrote: >Hello Friends, >I've added more notes to Gershon's meeting minutes. Andrzej and Rodolfo >in particular, please review the notes. I've also asked Kara W to >review the entire proposal since she had not yet read it. > >We have two proposals in the notes that we need to consider next week. >Each >Involves adding a third element to our plan. > >Kara's primary concern seems to be with the post-processing for term >extraction. > >I also wonder if that processing could not be revised to account for >the acronym in the expanded form rather than adding complexity to this >proposal. > >JoAnn > >JoAnn T. Hackos, PhD >President >Comtech Services, Inc. >710 Kipling Street, Suite 400 >Denver, CO 80215 >303-232-7586 >joann.hackos@comtech-serv.com >joannhackos Skype >www.comtech-serv.com > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]