xliff message

Subject: RE: [xliff] Simplified XLIFF element tree

From: "Rodolfo M. Raya" <rmraya@maxprograms.com>
To: "'xliff'" <xliff@lists.oasis-open.org>
Date: Mon, 23 Aug 2010 09:37:46 -0300

Hi again,

There are two different issues to consider:

1) How to represent a single segment. 
2) How to represent segmentation information in an XLIFF file. 

I propose to use <trans-unit> with a <source>/<target> pair to represent a single segment (issue 1).

Before analyzing issue 2, we need to define some basic concepts and use cases. I would call "text block" to a portion of extracted translatable text that can be split into two or more "segments".

As Andrzej suggested, a "text block" can be represented using the existing <group> element. Each component "segment" can be stored in its own <trans-unit>. He provided an example that clearly shows how to do that.

The problematic use cases requiring segmentation information that I'm aware of are:

a) Wrong segmentation at text extraction time that needs to be fixed at translation time (an unrecognized abbreviation for example).

b) Translation of "m" segments that requires "n" segments in target language.

Both cases can be properly handled by allowing translators to merge and split <trans-unit> elements within a given <group> as needed. This can be done today using <group> and <trans-unit> and their existing attributes. I've implemented this mechanism several times and I know it works very well with XLIFF 1.0, 1.1 and 1.2.

We need to define in XLIFF 2.0 the official way in which segments can be merged or split at translation time. 

Regards,
Rodolfo
--
Rodolfo M. Raya   <rmraya@maxprograms.com>
Maxprograms      http://www.maxprograms.com


> -----Original Message-----
> From: Yves Savourel [mailto:ysavourel@translate.com]
> Sent: Monday, August 23, 2010 9:05 AM
> To: 'xliff'
> Subject: RE: [xliff] Simplified XLIFF element tree
> 
> >...You can segment the <para> or <p> at text
> > extraction time and put each segment in its own <trans-unit>.
> 
> I agree with Asgeir: extracting and segmenting should be two distinct
> operations. While they can be done transparently at the same time for the
> user, I think it's important to make a distinction between the representation
> of the extracted unit and the segments.
> 
> 
> >...If you use a spanning mechanism inside source, you will
> > have multiple segments in source and target and the number
> > of source fragments may not match the number of target
> > fragments; that's very bad for TM/MT support and not XSLT
> > friendly at all.
> 
> I agree with Rodolfo: there are some drawbacks with using spans: order,
> number of segments, etc. But those issues are maybe a product of
> segmentation-related processes we will always have. For example an
> automated tool can create a tentative alignment with n-to-m cases and
> provide the result in XLIFF for a user to finish/correct the aligned set.
> 
> Maybe there are other representations we can have other than using <trans-
> unit> or using <seg-source> that would allow a more seamless tracking of
> segments. We need to imagine it.
> 
> -ys
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-
> open.org/apps/org/workgroup/portal/my_workgroups.php

Follow-Ups:
- RE: [xliff] Simplified XLIFF element tree
  - From: Yves Savourel <ysavourel@translate.com>
- Re: [xliff] Simplified XLIFF element tree
  - From: Andrzej Zydron <azydron@xtm-intl.com>

References:
- Re: [xliff] Simplified XLIFF element tree
  - From: Asgeir Frimannsson <asgeirf@redhat.com>
- RE: [xliff] Simplified XLIFF element tree
  - From: "Rodolfo M. Raya" <rmraya@maxprograms.com>
- RE: [xliff] Simplified XLIFF element tree
  - From: Yves Savourel <ysavourel@translate.com>