RE: [xliff] XLIFF 2.0 example files for segmentation

-------- Original Message --------
Subject: Re: [xliff] XLIFF 2.0 example files for segmentation
From: Arle Lommel <alommel@gala-global.org>
Date: Wed, November 09, 2011 8:57 pm
To: "Rodolfo M. Raya" <rmraya@maxprograms.com>
Cc: "XLIFF TC" <xliff@lists.oasis-open.org>

OK. So the scenario I pulled from Dave's example, file with <segment>s is refactored as file with different <segments>s but has to be restored to original <segments> is a chimera. There would be no processing expectation that <segment>s are preserved or reconstructable and a tool MUST NOT rely on the <segment> element to reconstruct the source file. If that is correct, then Dave's example would need to lose snippets two and five. Is that right?

In essence, using <segment> moves segmentation from text extraction domain to translation domain.

In most cases, I think that makes sense. The tool needs the freedom to segment in the manner that it sees as appropriate.

Tools that create XLIFF files don't need to worry about segmentation as that is something that translators would be able to adjust at translation time.

If XLIFF files “don’t need to worry about segmentation…” doesn't that meant that this is not a core feature? If segmentation is moved to the translation domain and adjusted at translation time, aren't we saying that segmentation is really a function of the import filter (and hence beyond the scope of XLIFF), just as it would be for any other file format? I don't need to indicate segmentation in Word or IDML, so what makes XLIFF different in this regard?

What is the case where you would need <segment> in the XLIFF file where that need wouldn't be met by a non-XLIFF process inside the tool?

I still don't see that clearly enough.

-Arle


	Arle Lommel Standards Coordinator GALA Standards Initiative +1 (707) 709 8650 (GMT -4) Skype: arle_lommel LinkedIn: www.linkedin.com/in/arlelommel	The GALA Standards Initiative promotes the effective use of standards for international and multilingual content, builds awareness of best practices for their implementation, and helps the localization community make open standards work.

On Nov 9, 2011, at 13:42 , Rodolfo M. Raya wrote:

Hi Arle,

The idea of using <segment> and <ignorable> is to allow freedom for splitting and merging segments that belong to the same <unit>.

A tool that creates a <unit> with one or more <segment> should not care about how many <segment> elements the <unit> has when the time to create the translated document arrives.

Having <segment> in core will allow to create an XLIFF file with <unit> elements containing paragraphs, use a tool that allows translating at sentence level and finally get a translated file. Also, this would solve the problem of sentence reordering because users would be able to merge all <segment> elements from a <unit> and translate the paragraph as one piece of text, obtaining a better translation.

In essence, using <segment> moves segmentation from text extraction domain to translation domain. Tools that create XLIFF files don't need to worry about segmentation as that is something that translators would be able to adjust at translation time.

Regards,

Rodolfo

--

Rodolfo M. Raya
Maxprograms http://www.maxprograms.com

-------- Original Message --------
Subject: Re: [xliff] XLIFF 2.0 example files for segmentation
From: Arle Lommel <alommel@gala-global.org>
Date: Wed, November 09, 2011 8:09 pm
To: Yves Savourel <ysavourel@enlaso.com>
Cc: <xliff@lists.oasis-open.org>

Hi Yves et al.,

Perhaps I wasn't terribly clear. I actually agree with you (at least as far as I understand this): <unit> seems sufficient by itself without <segment> for the processing scenarios I envision in the wild. But it is actually insufficient to accomplish Dave’s example, which actually has at least two scenarios since there are two variants for the start and ending XLIFF files.

Look at his second XLIFF snippet, which uses <segment> in it. One of his scenarios was to go from that to the third snippet (which uses different <segment>s) but with the goal of spitting out the final XLIFF snippet (which uses the original <segment>s). Not so easy to do.

If we don't include segment in the start and end points, the problem goes away since the segmentation does not matter in the structural equivalent to the original you are reconstructing, but I was going off Dave's examples where <unit> is insufficient to get a structurally equivalent file because the <segments> are refactored.

Using <unit> makes more sense to me and I'm actually struggling to see the use case for <segment>. Perhaps someone can enlighten me on the use case where we would need <segment> in an XLIFF file. In most cases wouldn't the tool handle this internally, as I indicated? There may be a good use case for <segment> and a reason why we'd need it, but I don’t actually see what is gained in Dave's example in most processing scenarios. Forgive me if I'm dense, but I’m going off the examples given.

So what does <segment> accomplish for us? If it's needed, how do we deal with the expectation that one tool that uses it in its files should be able to get back a file with the same <segment> structure after another tool has refactored it (one of Dave’s scenarios)?

-Arle

On Nov 9, 2011, at 12:54 , Yves Savourel wrote:

> Hi Arle, all,
>
>> ...I have to admit that I'm a bit confused by the example
>> and the responses. <segment> itself may be very useful,
>> but if tools start playing around with <segment>s as in
>> your example, I think it will lead to all sorts of
>> problems.
>> ...
>> I would expect <segment>s to be immutable from the file
>> that creates them or the ability to roundtrip the data
>> runs a real risk of being broken.
>
> Mmm... I'm not sure I understand the concern with changing segments from one tool to the other. The extraction tool does not rely on <segment> or its optional id value to merge anything back: it uses <unit>.
>
> Tools should be able to modify the segmentation inside a <unit> otherwise how would translators correct mis-segmented entries for example? Or smart tools would re-segment a <unit> based on a TM to get more optimal matches?, etc.
>
> Cheers,
> -yves
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: xliff-help@lists.oasis-open.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org

--------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org

xliff message