xliff message

Subject: XLIFF 2.0 Core

From: Yves Savourel <ysavourel@translate.com>
To: <xliff@lists.oasis-open.org>
Date: Wed, 6 Apr 2011 05:41:58 -0600

To follow up on the teleconference discussion about the core.

One possible way to move forward can be to define the basic unit of extraction and build from there.


===== Segments

I would argue that the segments needs to be part of that basic structure.

The main reason for this is that if the segmentation representation is done through some optional structure, such structure would not be able to be as simple and as integrated as if it is part of the core.
 
--- What if the content is not "segmented"?

It's fine: Even if no segmentation process has been applied to a content, the result of the extraction of an item constitutes already a segment. The content of an extracted unit is simply made of at least one segment.

This has several advantages:

- there is no differences between accessing a segmented content or one that is not.
- any property applicable to a segment can be set at the proper level right from extraction.
- there is no reason to duplicate of the content.

If there is a need to know whether a content has been through a segmentation process or not, we could also have an attribute for this.


--- What about tools that do not handle "segments"?

Such tool would import the XLIFF data in a way that each XLIFF segment corresponds to one of the basic unit for that tool.

I suppose another option for such tool could be to re-assemble all the segments of the unit and use that as the basic unit. Segmentation change is one of the aspects that should not cause problem for the original tool to merge back the extracted text.


===== Representation

I see two possible main ways to represent this: grouping by segments or grouping by language.

All segments of the same language grouped together:

<unit id='1'>
 <source>
  <seg id='1'>source segment 1</seg>
  <seg id='2'>source segment 2</seg>
 </source>
 <target>
  <seg id='1'>target segment 1</seg>
  <seg id='2'>target segment 2</seg>
 </target>
</unit>

Or all the languages of the same segment grouped together:

<unit id='1'>
  <seg id='1'>
   <source>source content 1</source>
   <target>target content 1</target>
  </seg>
  <seg id='2'>
   <source>source content 2</source>
   <target>target content 2</target>
  </seg>
</unit>

They both have small advantages and drawbacks.
But the important point is the extra segment level 2.0 would introduce.

Cheers,
-ys

Follow-Ups:
- Re: [xliff] XLIFF 2.0 Core
  - From: Helena S Chapman <hchapman@us.ibm.com>
- RE: [xliff] XLIFF 2.0 Core
  - From: "Rodolfo M. Raya" <rmraya@maxprograms.com>