OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita-translation message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: From Bruce Esrig: Notes on the acronym proposal


 

-----Original Message-----
From: Bruce Esrig [mailto:esrig@alumni.princeton.edu] 
Sent: Friday, June 29, 2007 3:43 AM
To: JoAnn Hackos
Subject: Re: Notes on the acronym proposal

Hi JoAnn,

Could you echo this to the list?

When it's time to finalize this proposal, it might be helpful to have a
wiki page that everyone can see (for example a temporary wiki page at
dita.xml.org) that contains the current proposal, so that when a change
is agreed upon, people can see the current proposal by using refresh in
their browsers.

Bruce

========

1. //Language-by-language rules
         We have tried to avoided creating language-by-language rules
because the
         processing overhead is not trivial. Kara's recommendation would
require a
         language-by-language rule.//

To explain this note more fully:

Language-by-language rules arise when some languages require declension
of words in the expanded form, while others do not. In languages that do
not, it is feasible to show the expanded form first with the short form
in parentheses. Some style guides require this. In languages that do,
there is an advantage to putting the short form first, since then the
language rules would (in most languages?) permit the short form not to
be declined. This is good for translation because a single instance of
the term can be maintained in the terminology base and used in multiple
grammatical contexts.

If the short form can appear second in non-declining languages and first
in declining languages, then it is tempting to create a
language-specific processing behavior. If the short form and expanded
form are kept in separate elements, then processing can present them in
an order appropriate to the language.

An objection to this is that this capability would have to be
implemented in all conforming DITA processing systems. There are DITA
processing systems that are modifications of the DITA Open Toolkit or
completely independent implementations from it. If a language-specific
processing behavior is defined, it would not be sufficient to implement
it only in the DITA Open Toolkit.

Note, however, that the DITA Open Toolkit serves precisely this purpose
in the community: to demonstrate that the requirements in the
specification can be met, and to provide a reference implementation that
does meet those requirements. We would want to know from other vendors
how burdensome they would find a language-specific rule if it were "the
right thing to do".

As an alternative, a flag of some sort could be used to indicate whether
to present the short form or expanded form first. The suggestion on the
call, to put this flag at the element level, would be difficult to
maintain since all elements would behave the same way within a given
language. It's better to put the dependency at a global level, either in
a flag that controls the order or in a deduction that is made
automatically once the language is known. Basing the order on the
language is more reliable since otherwise the flag has to be set
correctly when the processing job is set up.

However, the flag may be required as an override to the default for a
language. The override may need to be language-specific. Some
non-declining languages may need to support two orders depending on what
the local style guide says about ordering in that language. Another
alternative that was not on the table in the most recent discussion is
to implement acronyms in only one way, with the short form first.

Regarding using a combined form and extracting the pieces from it, there
is still the requirement to know which order to present the pieces in.
This means that there is no reason to break the XML convention of
putting separate pieces of information in separate elements in the
source.

2. In case it helps to look at what we're simplifying away from ...
Another case that the current proposal does not support is versioning.
Suppose that a terminology bank has multiple historically-accurate but
time-bounded entries for a term.

An example that comes to mind is described in
http://en.wikipedia.org/wiki/Timeline_of_AIDS, namely: "1986: HIV (human
immunodeficiency virus) is adopted as name of the retrovirus that was
first proposed as the cause of AIDS by Luc Montagnier of France, who
named it LAV (lymphadenopathy associated virus) and Robert Gallo of the
United States, who named it HTLV-III (human T-lymphotropic virus type
III) ".

If we wished to record a relationship among these terms in the source in
DITA, we would need two IDs for the term: one for the surface form and
one for the meaning. The meaning is the term bank entry that recognizes
the connection among the alternate surface forms. This could be done by
treating the ID in the term as a reference to the surface form and
providing the ID for the meaning, when required, within a <data> element
nested within the outermost <acronym> element.

A passage that referred to multiple terms would do so by using each
surface form as needed, and indicating the connection among them using
the <data> element.

The DITA markup for this passage would be:

<p>1986</p>
<ul><li>
   <acronym id="hiv-current">
       <data name="meaning" value="hiv">
       <short>HIV</short><expanded>human immunodeficiency
virus</expanded>
   </acronym>
is adopted as name of the retrovirus that was first proposed as the
cause of AIDS by Luc Montagnier of France, who named it
   <acronym id="hiv-montagnier">
       <data name="meaning" value="hiv">
       <short>LAV</short><expanded>lymphadenopathy associated
virus</expanded>
   </acronym>
and Robert Gallo of the United States, who named it
   <acronym id="hiv-gallo">
       <data name="meaning" value="hiv">
       <short>HTLV-III</short><expanded>human T-lymphotropic virus type
III</expanded>
   </acronym>.
</li></ul>

According to this markup, <data> would need to be permitted within
<keyword> since <acronym> is proposed as a specialization of <keyword>.

3. We may need to do more work to unify the term-like elements in DITA.
As of February 2007, the DITA 1.1. Architecture guide treats "metadata" 
separately, but seems to lack a thorough statement on terminology. 
<keyword>, <indexterm>, and <term> have related behaviors, and may need
to be managed in parallel.

Applying this question to the <acronym> proposal ... As in the case of
the <keyword> element, the <data> element would probably need to be
supported within <term> and most likely <indexterm>.

At 12:54 PM 6/27/2007, you wrote:
>Hello Friends,
>I've added more notes to Gershon's meeting minutes. Andrzej and Rodolfo

>in particular, please review the notes. I've also asked Kara W to 
>review the entire proposal since she had not yet read it.
>
>We have two proposals in the notes that we need to consider next week.
>Each
>Involves adding a third element to our plan.
>
>Kara's primary concern seems to be with the post-processing for term 
>extraction.
>
>I also wonder if that processing could not be revised to account for 
>the acronym in the expanded form rather than adding complexity to this 
>proposal.
>
>JoAnn
>
>JoAnn T. Hackos, PhD
>President
>Comtech Services, Inc.
>710 Kipling Street, Suite 400
>Denver, CO 80215
>303-232-7586
>joann.hackos@comtech-serv.com
>joannhackos Skype
>www.comtech-serv.com
>
>





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]