xliff message

Subject: RE: [xliff] Generic mechanism for translation candidate elements and other annotations

From: Yves Savourel <ysavourel@enlaso.com>
To: <xliff@lists.oasis-open.org>
Date: Wed, 7 Mar 2012 08:13:21 -0700

Hi Fredrik, Rodolfo, all,

F> The most straight forward way to do that that 
F> I came up with would be to have per document
F> unique IDs for all referable elements.

I'm not sure per-document unique IDs would work as, one could add/remove <file> elements in a document.
But per-<file> unique IDs would certainly allow a much cleaner way to establish the relationships.

But would all IDs be included in that set? Or only the Ids for <unit>, <segment> and <mrk> (and <data> in <originalData>)? What about IDs of inline codes (<ph>, etc.)?


R> The effect of re-segmentation over matches is not
R> new. This time we have to add processing expectations
R> that require updating matches according to the changes.

It's certainly true. But any change that wouldn't involve keeping the information about the original span of content that was associated with the match would essentially be a loss of information.
But maybe that is OK.


R> There may be a need to know what section of <source> 
R> is being matched and the relevant information should 
R> live in the corresponding <match> element, keeping the 
R> original <source> clean. This can be done, for example, 
R> by using 2 attributes: one attribute indicates the offset 
R> where the match starts and the other indicates the 
R> length of the text matched (in both cases ignoring tags).

Using start/length (or start/end) positions is something that we have not explored much.
It could be a way to replace completely <mrk>.

Two issues come to mind with offsets:

a) we would need to be extremely strict on how to handle white spaces. Currently there is room for choice by the tool.

b) any change to the content would require an update on all annotations. That may be a burdensome processing expectation.

But it has its advantages too: for example overlapping and superposing spans are cleanly handled, unlike with <mrk> where you might have to keep track of the nesting order.


I think Rodolfo's suggestion also bring up the question of <source> being read-only or not.

To me a modern XLIFF needs to be able to allow enriching the source content. So we have to find a way to annotate both content; whether it's using offsets or elements like <mrk>, or another solution.


Cheers,
-yves

Follow-Ups:
- RE: [xliff] Generic mechanism for translation candidate elements and other annotations
  - From: "Rodolfo M. Raya" <rmraya@maxprograms.com>

References:
- RE: [xliff] Generic mechanism for translation candidate elements and other annotations
  - From: "Estreen, Fredrik" <Fredrik.Estreen@lionbridge.com>