OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff-seg message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Segmentation representation and scenario

Some ideas on segmentation representation:

For representing the segmentation inside a <trans-unit> I would use the
<mrk> element:

<trans-unit id='2'>
 <source xml:lang='en'><mrk mid='2-1' mtype='phrase'>This is the second
entry of the file.</mrk>
<mrk mid='2-2' mtype='phrase'>This is the second sentence of the second
 <target xml:lang='fr'><mrk mid='2-1' mtype='phrase'>Ceçi est la première
entrée du fichier.</mrk>
<mrk mid='2-1' mtype='phrase'>Ceçi est la seconde phrase de la première

- It's part of the existing specifications.
- It's un-intrusive: mergers are suppose to ignore it.
- We can have a set of specific extended attributes if we want to store
sentence-level information.
- We would probably need to add a mtype value specific for a 'segment'
('phrase' is not good enough).

I agree that translation tools should be able to provide there own
segmentation within a <trans-unit> and that during the translation itself
(by the translator).

I also think that a translation tool should be able to use any existing
match at the <trans-unit> level as well: there is no reason to go to a finer
granularity if a match is already available at the <trans-unit> level. This
said, there is obviously a threshold of usability for fuzzy matches at the
<trans-unit> level. And that threshold is most likely commensurable to the
size of the text in the <trans-unit> (as for large units the differences
between the new source and the old one may be more difficult to see).

I think a translation process should be able to take advantage of such high
matches obtained without the translation tool and without segmentation of
the <trans-unit> content. Translation tools should allow the verification of
such matches during the translation.

For example: one can imagine a project where version 2 of a software is to
be localized. A version 1 with translation exists, but no TM. One can easily
create a "TM" without complexe tools for <trans-unit> level entries. One
should be able to re-use high matches of that "TM" regardless what
segmentation is use by the translation tools.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]