xliff-seg message

Subject: Segmentation representation and scenario
From: "Yves Savourel" <ysavourel@translate.com>
To: "XLIFF Segmentation" <xliff-seg@lists.oasis-open.org>
Date: Tue, 11 May 2004 08:17:14 -0600
Some ideas on segmentation representation:

For representing the segmentation inside a <trans-unit> I would use the
<mrk> element:

<trans-unit id='2'>
 <source xml:lang='en'><mrk mid='2-1' mtype='phrase'>This is the second
entry of the file.</mrk>
<mrk mid='2-2' mtype='phrase'>This is the second sentence of the second
entry.</mrk></source>
 <target xml:lang='fr'><mrk mid='2-1' mtype='phrase'>Ceçi est la première
entrée du fichier.</mrk>
<mrk mid='2-1' mtype='phrase'>Ceçi est la seconde phrase de la première
entrée.</mrk></target>
</trans-unit>

- It's part of the existing specifications.
- It's un-intrusive: mergers are suppose to ignore it.
- We can have a set of specific extended attributes if we want to store
sentence-level information.
- We would probably need to add a mtype value specific for a 'segment'
('phrase' is not good enough).

I agree that translation tools should be able to provide there own
segmentation within a <trans-unit> and that during the translation itself
(by the translator).

I also think that a translation tool should be able to use any existing
match at the <trans-unit> level as well: there is no reason to go to a finer
granularity if a match is already available at the <trans-unit> level. This
said, there is obviously a threshold of usability for fuzzy matches at the
<trans-unit> level. And that threshold is most likely commensurable to the
size of the text in the <trans-unit> (as for large units the differences
between the new source and the old one may be more difficult to see).

I think a translation process should be able to take advantage of such high
matches obtained without the translation tool and without segmentation of
the <trans-unit> content. Translation tools should allow the verification of
such matches during the translation.

For example: one can imagine a project where version 2 of a software is to
be localized. A version 1 with translation exists, but no TM. One can easily
create a "TM" without complexe tools for <trans-unit> level entries. One
should be able to re-use high matches of that "TM" regardless what
segmentation is use by the translation tools.


Cheers,
-yves