OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Subject: RE: [xliff] comments on dtd

Thanks for posting those comments David.

I'll try to answer a few of them. Not having working together yet there is
maybe some terms we don't use the same way: if I'm not clear, please, let me
know and I'll try to re-formulate.

> 1. document validators - we should have support for W3C Schema, Schematron
and RELAX NG, as well as DTD.

I agree that we should have different ways to specify XLIFF so different
people using different tools can have easy access to it. We can probably
generate some of those schemas (or at leats a base to work from) from the
DTD using converters as Christain showed me yesterday. I guess we should
open the discussion on what schemas to use besides the DTD.

> 2. Does not have entities for EXTRACT and MERGE.

I'm not sure I understand the note. Could you explicit what you call
'EXTRACT' and 'MERGE'? Maybe the following description of XLIFF with regard
to extraction and merging will help:

An XLIFF document stores initially the result of an extraction. The original
input is split into 2 main streams: the localizable data are in the content
of <source> and in various attributes (coord, etc.). Some original code can
also be encapsulated withing <source> using all the inline elements: <bpt>,
<ept>, <it>, <ph>. The rest of the non-localizable data is stored in the
"skeleton". The skeleton is a separate file that can be either referenced
from the XLIFF document (using the <skl> element with an <external-file>
element), or embedded in a <internal-file> element (still in the <skl>

The translated file is reconstructed (merged) from the skeleton (whereever
it is located) and the content of the <target> elements (which have been
added during the localization process).

> 3. Does not have entities for character map used in saved file (from

I see two different meanings here, I'll re-pharse the comment two different
ways to see which one (if any) is the right one:

a) "XLIFF doesn't have a way to indicate what encoding has been used for the
translated text."
That's true: XLIFF uses any appropriate encoding as defined by XML specs.
The mechanism to indicate the encoding used in the translated XLIFF document
is the standard XML encoding declaration.

b) "XLIFF doesn't have a way to indicate what encoding should be used for
the translated text when merging the text into the original format."
That's also true: the assumption (maybe incorrect) is that, knowing which
type of format, which language and which platform the text is targeted for,
the merger tool is responsible for using the appropriate encoding (possibly
with the help of the end-user). This is consistent with how most current
localization tools work. We may need to look at this more closely.

> 4. Target lang should be target+ in 'ELEMENT trans-unit', unless that's
not intended for the whole job. [Inquiry: what is 'ELEMENT trans-unit'
intended to handle?]

The <trans-unit> element is the place where the source and one translation
of a given localizable item is stored. An 'item' is not defined beyond being
(most of the time) a run of translatable text. For example it can be a
string from a Windows RC stringtable group, the value of a key/value pair of
a Java properties file, the content of a <p> element in HTML, the value of a
alt attribute in HTML, etc.
Actually a <trans-unit> is allowed to have empty <source> and <target>. This
is to hanlde cases where the localizable data is not text but other
information: coordinates of a control for example, it needs to be
represented in case some tools provide capability such as resizing, etc.
XLIFF does not address explicitely anything related to segmentation.

XLIFF is intended to handle a source language and ONE target language in
each <file> element. This is a decision that was made very early in the
design of the format, and the structure of XLIFF reflect that (otherwise we
wouldn't have that <source>/<target> pair for example). The main reason (as
far as can recall) was that the advantages of having multilingual files
where not that big to be worth the complication. In addition it seems that,
in some cases, multilingual files even cause problems in the process: most
of the time you have to split the file per translator anyway. I'm sure other
will be able to elaborate why a simple bilingual architecture was chosen
rather than a multilingual one.

The use of "target?" (zero or one target) rather than "target+" (one target)
is there to allow <trans-unit> with only a source text. I think it was
"target?" at the beginning and we changed it to "target+". Comments anyone?

> 5. Does not have QC/Proofer captured.

I think this is captured in the <phase> element. That element is there to
allow tools to flag the progress of the document through the localization
process, and even keep track of the changes through links using the
phase-name attribute. Maybe someone from the "Status-Flags" sub-group can
address this and give example?

> 6. Will need to support non-UTF-8 imported entities (eg. SAE Gen, Fordsym,

I'm not sure if I understand this well. Could you elaborate and maybe give
an example?

> 7. Should support SIO, and have more atts needed for inline elements.

Same here. You lost me with "SIO" :) Does it stands for "Serial Input
Output", "Shift-In (shift)-Out"? Could you elaborate and maybe give a few

Thanks for taking the time to go through this David. Hopefully other will be
able to elaborate my answers and possibly address the points I failed
(miserably) to understand.

Kind regards,

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC