OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [xliff] XLIFF 2.0 example files for segmentation

Hi Arle,

The idea of using <segment> and <ignorable> is to allow freedom for splitting and merging segments that belong to the same <unit>.  

A tool that creates a <unit> with one or more <segment> should not care about how many <segment> elements the <unit> has when the time to create the translated document arrives.

Having <segment> in core will allow to create an XLIFF file with <unit> elements containing paragraphs, use a tool  that allows translating at sentence level and finally get a translated file. Also, this would solve the problem of sentence reordering because users would be able to merge all <segment> elements from a <unit> and translate the paragraph as one piece of text, obtaining a better translation.

In essence, using <segment> moves segmentation from text extraction domain to translation domain. Tools that create XLIFF files don't need to worry about segmentation as that is something that translators would be able to adjust at translation time.

Rodolfo M. Raya
Maxprograms http://www.maxprograms.com

-------- Original Message --------
Subject: Re: [xliff] XLIFF 2.0 example files for segmentation
From: Arle Lommel <alommel@gala-global.org>
Date: Wed, November 09, 2011 8:09 pm
To: Yves Savourel <ysavourel@enlaso.com>
Cc: <xliff@lists.oasis-open.org>

Hi Yves et al.,

Perhaps I wasn't terribly clear. I actually agree with you (at least as far as I understand this): <unit> seems sufficient by itself without <segment> for the processing scenarios I envision in the wild. But it is actually insufficient to accomplish Dave’s example, which actually has at least two scenarios since there are two variants for the start and ending XLIFF files.

Look at his second XLIFF snippet, which uses <segment> in it. One of his scenarios was to go from that to the third snippet (which uses different <segment>s) but with the goal of spitting out the final XLIFF snippet (which uses the original <segment>s). Not so easy to do.

If we don't include segment in the start and end points, the problem goes away since the segmentation does not matter in the structural equivalent to the original you are reconstructing, but I was going off Dave's examples where <unit> is insufficient to get a structurally equivalent file because the <segments> are refactored.

Using <unit> makes more sense to me and I'm actually struggling to see the use case for <segment>. Perhaps someone can enlighten me on the use case where we would need <segment> in an XLIFF file. In most cases wouldn't the tool handle this internally, as I indicated? There may be a good use case for <segment> and a reason why we'd need it, but I don’t actually see what is gained in Dave's example in most processing scenarios. Forgive me if I'm dense, but I’m going off the examples given.

So what does <segment> accomplish for us? If it's needed, how do we deal with the expectation that one tool that uses it in its files should be able to get back a file with the same <segment> structure after another tool has refactored it (one of Dave’s scenarios)?


On Nov 9, 2011, at 12:54 , Yves Savourel wrote:

> Hi Arle, all,
>> ...I have to admit that I'm a bit confused by the example
>> and the responses. <segment> itself may be very useful,
>> but if tools start playing around with <segment>s as in
>> your example, I think it will lead to all sorts of
>> problems.
>> ...
>> I would expect <segment>s to be immutable from the file
>> that creates them or the ability to roundtrip the data
>> runs a real risk of being broken.
> Mmm... I'm not sure I understand the concern with changing segments from one tool to the other. The extraction tool does not rely on <segment> or its optional id value to merge anything back: it uses <unit>.
> Tools should be able to modify the segmentation inside a <unit> otherwise how would translators correct mis-segmented entries for example? Or smart tools would re-segment a <unit> based on a TM to get more optimal matches?, etc.
> Cheers,
> -yves
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: xliff-help@lists.oasis-open.org

To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]