OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] Generic mechanism for translation candidate elements and other annotations


Hi Helena,

 

It does indeed. In this context I think we probably agree that it means a position in the *parsed* text, so in Unicode code points (not byte).

 

Cheers,

-yves

 

From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Helena S Chapman
Sent: Wednesday, March 07, 2012 9:32 AM
To: Rodolfo M. Raya
Cc: xliff@lists.oasis-open.org
Subject: RE: [xliff] Generic mechanism for translation candidate elements and other annotations

 

The definition of offset should be tightened. If the source content is in UTF8 and predominantly Japanese or Chinese, what does an offset mean in that context?



From:        "Rodolfo M. Raya" <rmraya@maxprograms.com>
To:        <xliff@lists.oasis-open.org>
Date:        03/07/2012 11:10 AM
Subject:        RE: [xliff] Generic mechanism for translation candidate elements and other annotations
Sent by:        <xliff@lists.oasis-open.org>





> -----Original Message-----
> From: xliff@lists.oasis-open.org [
mailto:xliff@lists.oasis-open.org] On Behalf
> Of Yves Savourel
> Sent: Wednesday, March 07, 2012 1:13 PM
> To: xliff@lists.oasis-open.org
> Subject: RE: [xliff] Generic mechanism for translation candidate elements
> and other annotations
>
> R> The effect of re-segmentation over matches is not new. This time we
> R> have to add processing expectations that require updating matches
> R> according to the changes.
>
> It's certainly true. But any change that wouldn't involve keeping the
> information about the original span of content that was associated with the
> match would essentially be a loss of information.
> But maybe that is OK.

Any change in segmentation will have an effect on operations that depend on the original structure of source text, regardless of the annotation model you select.

BTW, we need a mechanism for storing the history of changes done to a <unit>.


> R> There may be a need to know what section of <source> is being matched
> R> and the relevant information should live in the corresponding <match>
> R> element, keeping the original <source> clean. This can be done, for
> R> example, by using 2 attributes: one attribute indicates the offset
> R> where the match starts and the other indicates the length of the text
> R> matched (in both cases ignoring tags).
>
> Using start/length (or start/end) positions is something that we have not
> explored much.
> It could be a way to replace completely <mrk>.

It deserves to be explored. We should avoid altering <source> as much as possible.


> Two issues come to mind with offsets:
>
> a) we would need to be extremely strict on how to handle white spaces.
> Currently there is room for choice by the tool.

Sure, we need to be strict regarding the way offsets are measured.

We can, for example, require space normalization for offset and length calculation when xml:space is set to "default". Normalization can be done by replacing every substring composed by multiple white spaces by a single space character.


> b) any change to the content would require an update on all annotations.
> That may be a burdensome processing expectation.

It's also troublesome when you use <mrk>. Adjustments in segmentation imply adjustments in matching or other processes regardless the annotation model you select.


> But it has its advantages too: for example overlapping and superposing spans
> are cleanly handled, unlike with <mrk> where you might have to keep track
> of the nesting order.

The biggest advantage is that <source> doesn't have to be altered with extra elements.
 
> I think Rodolfo's suggestion also bring up the question of <source> being
> read-only or not.

That was a basic non-written principle from the early days of XLIFF 1.0. I would certainly make <source> read-only, allowing only merge and split operations for segmentation purposes.

> To me a modern XLIFF needs to be able to allow enriching the source
> content. So we have to find a way to annotate both content; whether it's
> using offsets or elements like <mrk>, or another solution.

Offsets offer a clean way, with the advantage that annotations could be easily removed without affecting <source>.

Offsets can also be used to annotate <target> elements without affecting the translation. There is no obstacle for using the same mechanism for <source>,  <target> and perhaps other elements.

Regards,
Rodolfo
--
Rodolfo M. Raya       rmraya@maxprograms.com
Maxprograms      
http://www.maxprograms.com





---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]