xliff message

Subject: RE: [xliff] comments on dtd
From: John Reid <jreid@novell.com>
To: xliff@lists.oasis-open.org
Date: Wed, 23 Jan 2002 22:54:34 -0700
Hi All,

A review of the spec may clear up some of the points better than  this
discussion. I understand it wasn't easily available before our meeting
and some may not have had a chance to review it. I would hope that those
that haven't already done so to read it. It is now posted at our TC
website on OASIS. 

I would like to elaborate a little on Yves's answers, as follows.

>>> Yves Savourel <ysavourel@translate.com> 1/23/02 4:56:16 PM >>>
Thanks for posting those comments David.

I'll try to answer a few of them. Not having working together yet there
is
maybe some terms we don't use the same way: if I'm not clear, please,
let me
know and I'll try to re-formulate.


> 1. document validators - we should have support for W3C Schema,
Schematron
and RELAX NG, as well as DTD.

I agree that we should have different ways to specify XLIFF so
different
people using different tools can have easy access to it. We can
probably
generate some of those schemas (or at leats a base to work from) from
the
DTD using converters as Christain showed me yesterday. I guess we
should
open the discussion on what schemas to use besides the DTD.

<jr>This isn't a weakness in the spec; the spec simply describes the
dictionary. The DTD and schema are artifacts of the spec. It is in the
charter to create a schema; the schema type is unspecified. We could 
become bogged down in a discussion of schemas when the work at hand is
to approve or improve the current spec; with the multiplicity of schemas
we could spend months on this topic alone. It is a good sub-committee
discussion.</jr>

-----
> 2. Does not have entities for EXTRACT and MERGE.
-----

I'm not sure I understand the note. Could you explicit what you call
'EXTRACT' and 'MERGE'? Maybe the following description of XLIFF with
regard
to extraction and merging will help:

An XLIFF document stores initially the result of an extraction. The
original
input is split into 2 main streams: the localizable data are in the
content
of <source> and in various attributes (coord, etc.). Some original code
can
also be encapsulated withing <source> using all the inline elements:
<bpt>,
<ept>, <it>, <ph>. The rest of the non-localizable data is stored in
the
"skeleton". The skeleton is a separate file that can be either
referenced
from the XLIFF document (using the <skl> element with an
<external-file>
element), or embedded in a <internal-file> element (still in the <skl>
element).

The translated file is reconstructed (merged) from the skeleton
(whereever
it is located) and the content of the <target> elements (which have
been
added during the localization process).

<jr>Specific extract and merge entities/elements have purposely been
undefined. The method of obtaining localizable data in the XLIFF file
varies by publisher. Some use databases which contain the localizable
data and some will use skeleton files. Others may use yet another
system. Because we don't want to impose process on the publisher, we've
tried to allow for any process that can produce valid XLIFF. This allows
for a great deal of flexibility to the publisher. 
There are elements defined which are available to the publiisher for
these purposes. From the spec, "The <prop-group> element contains
tool-specific information used in combining the data with the skeleton
file or storing the data in a repository." The <prop-group> contains the
<prop> element, which contains the actual tool-specific data. There is
also the ts attribute of the following elements: <file>, <group>,
<trans-unit>, <source>, <target>, <bin-unit>, <bin-source>,
<bin-target>, <alt-trans>, <mrk>, <g>, <x/>, <bx/>, <ex/>, <bpt>, <ept>,
<ph>, <it>. From the spec, "The ts attribute allows you to include short
data understood by a specific toolset." In addition, the <context>
element allows for information of this nature, also.</jr>

-----
> 3. Does not have entities for character map used in saved file (from
translation).
-----

I see two different meanings here, I'll re-pharse the comment two
different
ways to see which one (if any) is the right one:

a) "XLIFF doesn't have a way to indicate what encoding has been used
for the
translated text."
That's true: XLIFF uses any appropriate encoding as defined by XML
specs.
The mechanism to indicate the encoding used in the translated XLIFF
document
is the standard XML encoding declaration.

b) "XLIFF doesn't have a way to indicate what encoding should be used
for
the translated text when merging the text into the original format."
That's also true: the assumption (maybe incorrect) is that, knowing
which
type of format, which language and which platform the text is targeted
for,
the merger tool is responsible for using the appropriate encoding
(possibly
with the help of the end-user). This is consistent with how most
current
localization tools work. We may need to look at this more closely.

<jr>Do you mean XLIFF does not have a mechanism for having different
encoding of the target from the source? If so, that is true. The
assumption is that the target and source will both be encoded the same.
Usually in UTF-8. However, some mechanism for indicating a different
encoding in the target may be useful.</jr>

-----
> 4. Target lang should be target+ in 'ELEMENT trans-unit', unless
that's
not intended for the whole job. [Inquiry: what is 'ELEMENT trans-unit'
intended to handle?]
-----

The <trans-unit> element is the place where the source and one
translation
of a given localizable item is stored. An 'item' is not defined beyond
being
(most of the time) a run of translatable text. For example it can be a
string from a Windows RC stringtable group, the value of a key/value
pair of
a Java properties file, the content of a <p> element in HTML, the value
of a
alt attribute in HTML, etc.
Actually a <trans-unit> is allowed to have empty <source> and <target>.
This
is to hanlde cases where the localizable data is not text but other
information: coordinates of a control for example, it needs to be
represented in case some tools provide capability such as resizing,
etc.
XLIFF does not address explicitely anything related to segmentation.

XLIFF is intended to handle a source language and ONE target language
in
each <file> element. This is a decision that was made very early in
the
design of the format, and the structure of XLIFF reflect that
(otherwise we
wouldn't have that <source>/<target> pair for example). The main reason
(as
far as can recall) was that the advantages of having multilingual
files
where not that big to be worth the complication. In addition it seems
that,
in some cases, multilingual files even cause problems in the process:
most
of the time you have to split the file per translator anyway. I'm sure
other
will be able to elaborate why a simple bilingual architecture was
chosen
rather than a multilingual one.

The use of "target?" (zero or one target) rather than "target+" (one
target)
is there to allow <trans-unit> with only a source text. I think it was
"target?" at the beginning and we changed it to "target+". Comments
anyone?

<jr>The <trans-unit> properly allows only zero or one target for any
<source>. The DTD has it as target?. Alternate translations can be
stored in the <alt-trans> element which contains target+. The targets in
the alt-trans can come from a variety of places including translator
versions and TMs. There is only one allowable target in a trans-unit
because that is considered the current or final version. The strongest
argument against multilingual XLIFF (more than one target language) was
the versioning problem. It would be too difficult to keep the languages
in sync.</jr>

-----
> 5. Does not have QC/Proofer captured.
-----

I think this is captured in the <phase> element. That element is there
to
allow tools to flag the progress of the document through the
localization
process, and even keep track of the changes through links using the
phase-name attribute. Maybe someone from the "Status-Flags" sub-group
can
address this and give example?

<jr>Yves is quite correct about this. Maybe Tony can give you access
the to the DataDefinition Yahoo group so that you can see our
discussions on that topic.</jr>

-----
> 6. Will need to support non-UTF-8 imported entities (eg. SAE Gen,
Fordsym,
TEI)
-----

I'm not sure if I understand this well. Could you elaborate and maybe
give
an example?



-----
> 7. Should support SIO, and have more atts needed for inline
elements.
-----

Same here. You lost me with "SIO" :) Does it stands for "Serial Input
Output", "Shift-In (shift)-Out"? Could you elaborate and maybe give a
few
examples.

<jr>Please elaborate points 6 & 7.</jr>

Thanks for taking the time to go through this David. Hopefully other
will be
able to elaborate my answers and possibly address the points I failed
(miserably) to understand.

Kind regards,
-yves

<jr>Thanks for looking this over. I hope this explains some things. We
need to get everyone access to the discussions on the DataDefinition
group site.

Cheers,
John</jr>


----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>