xliff message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [xliff] Generic mechanism for translation candidate elements and other annotations
- From: Helena S Chapman <hchapman@us.ibm.com>
- To: "Rodolfo M. Raya" <rmraya@maxprograms.com>
- Date: Wed, 7 Mar 2012 11:32:09 -0500
The definition of offset should be tightened.
If the source content is in UTF8 and predominantly Japanese or Chinese,
what does an offset mean in that context?
From:
"Rodolfo M. Raya"
<rmraya@maxprograms.com>
To:
<xliff@lists.oasis-open.org>
Date:
03/07/2012 11:10 AM
Subject:
RE: [xliff]
Generic mechanism for translation candidate elements and other annotations
Sent by:
<xliff@lists.oasis-open.org>
> -----Original Message-----
> From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
On Behalf
> Of Yves Savourel
> Sent: Wednesday, March 07, 2012 1:13 PM
> To: xliff@lists.oasis-open.org
> Subject: RE: [xliff] Generic mechanism for translation candidate elements
> and other annotations
>
> R> The effect of re-segmentation over matches is not new. This
time we
> R> have to add processing expectations that require updating matches
> R> according to the changes.
>
> It's certainly true. But any change that wouldn't involve keeping
the
> information about the original span of content that was associated
with the
> match would essentially be a loss of information.
> But maybe that is OK.
Any change in segmentation will have an effect on operations that depend
on the original structure of source text, regardless of the annotation
model you select.
BTW, we need a mechanism for storing the history of changes done to a <unit>.
> R> There may be a need to know what section of <source> is
being matched
> R> and the relevant information should live in the corresponding
<match>
> R> element, keeping the original <source> clean. This can
be done, for
> R> example, by using 2 attributes: one attribute indicates the
offset
> R> where the match starts and the other indicates the length of
the text
> R> matched (in both cases ignoring tags).
>
> Using start/length (or start/end) positions is something that we have
not
> explored much.
> It could be a way to replace completely <mrk>.
It deserves to be explored. We should avoid altering <source> as
much as possible.
> Two issues come to mind with offsets:
>
> a) we would need to be extremely strict on how to handle white spaces.
> Currently there is room for choice by the tool.
Sure, we need to be strict regarding the way offsets are measured.
We can, for example, require space normalization for offset and length
calculation when xml:space is set to "default". Normalization
can be done by replacing every substring composed by multiple white spaces
by a single space character.
> b) any change to the content would require an update on all annotations.
> That may be a burdensome processing expectation.
It's also troublesome when you use <mrk>. Adjustments in segmentation
imply adjustments in matching or other processes regardless the annotation
model you select.
> But it has its advantages too: for example overlapping and superposing
spans
> are cleanly handled, unlike with <mrk> where you might have
to keep track
> of the nesting order.
The biggest advantage is that <source> doesn't have to be altered
with extra elements.
> I think Rodolfo's suggestion also bring up the question of <source>
being
> read-only or not.
That was a basic non-written principle from the early days of XLIFF 1.0.
I would certainly make <source> read-only, allowing only merge and
split operations for segmentation purposes.
> To me a modern XLIFF needs to be able to allow enriching the source
> content. So we have to find a way to annotate both content; whether
it's
> using offsets or elements like <mrk>, or another solution.
Offsets offer a clean way, with the advantage that annotations could be
easily removed without affecting <source>.
Offsets can also be used to annotate <target> elements without affecting
the translation. There is no obstacle for using the same mechanism for
<source>, <target> and perhaps other elements.
Regards,
Rodolfo
--
Rodolfo M. Raya rmraya@maxprograms.com
Maxprograms http://www.maxprograms.com
---------------------------------------------------------------------
To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: xliff-help@lists.oasis-open.org
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]