OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] Translating XLIFF 1.2


Hi Rodolfo,

> Segmentation is a process that can be done before 
> creating an XLIFF file. It can even be done on 
> the fly while the XLIFF file is being generated.

I see many different files and file formats every year, the cases where the format drives the segmentation are, in my experience, rather rare. In any case they can be supported with <seg-source> as well:

<source>My segment.</source>
<seg-source><mrk mid='1' mtype='seg'>My segment.</mrk></seg-source>

In fact, it is very important that <seg-source> is used in those cases: That is the only interoperable way another tool can know those trans-units are representing single segments.


Note also it is the opinion of the majority of the XLIFF TC that the segmentation happens mostly after extraction. Otherwise you would not have this in the 1.2 specification:

"It is important to note that the manipulation / segmentation of trans-unit elements is owned by the "translator" domain, not at the extraction filter domain. This means that segmentation will be performed by the editing tool or possibly an automated segmentation process."


> If the text to translate is already segmented, 
> there is absolutely no need to use <seg-source> at all.

Au contraire: It's the perfect opportunity to use <seg-source>.


> The sentences in a paragraph can easily be maintained 
> together using a <group> element to enclose the related 
> <trans-unit> elements. This also optional "segmentation model"
> has been possible since XLIFF 1.0.

First, Let's be very clear on one thing: The first time XLIFF has addressed segmentation is in 1.2. So there cannot be an XLIFF-standard or even a traditional way to represent segmentation before 1.2. This also means 1.2 does not have to be backward compatible with any segmentation representation because none existed from the viewpoint of XLIFF.

I remember very well the first XLIFF meeting I was in, in Dublin, before XLIFF was even named XLIFF. Choosing a name for <trans-unit> prompted a discussion about segmentation and we decided that segmentation was not going to be addressed. A <source> element is broadly defined as "... unit of text that could be a paragraph, a title, a menu item, a caption, etc."

The segmentation representation was only addressed in 1.2, and the TC had a sub-committee set to work on it. The result is the <seg-source> model.

So if a specific tool choose that <trans-unit> represented the result of a segmentation was an implementer choice.


Now, as for using <trans-unit>/<group> to represent a paragraph and its sentences:

While there is nothing precluding a tool to do this. It does not follow the 1.2 specification recommendation where we have: "...It is important to note that the manipulation / segmentation of trans-unit elements is owned by the "translator" domain, not at the extraction filter domain."

To me the groups and trans-units are created by the extraction tool. And I expect to see those back when merging. If another user-agent starts modifying the structure of the XLIFF document we are going to have merging problems.

Let say, a filter creates this:

<trans-unit id='1'>
 <source>My segment 1. My segment 2.</source>
</trans-unit>

Then a user-agent specialized in segmentation does this:

<group id='1'>
 <trans-unit id='1-1'>
  <source>My segment 1. </source>
 </trans-unit>
<trans-unit id='1-2'>
  <source>My segment 2.</source>
 </trans-unit>
</group>

Then I open the result in an editor, translate it and the editor carefully re-create the XLIFF it got. I get this:

<group id='1'>
 <trans-unit id='1-1'>
  <source>My segment 1. </source>
  <target>Mon segment 1. </target>
 </trans-unit>
<trans-unit id='1-2'>
  <source>My segment 2.</source>
  <target>Mon segment 2.</target>
 </trans-unit>
</group>

Then the merging tool is in trouble.

Using <trans-unit>/<group> for representing segmentation is very bad, in my opinion, because it precludes other tools to modify the segmentation easily.

I can understand that in some case, the <trans-unit> becomes a segment unit when the original file format dictates it. But, again, it is a rare case, and it is also supported by the <seg-source> model.


Cheers,
-ys






[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]