OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [xliff] XLIFF 2.0 Core

I agree the reference to the definition of segment should be mandatory. And, I prefer this done by language for simplicity. In reality, if I pick up a piece of English content (forget translation for a minute), the way I segment the content would probably be the same as how you would do it about 80% of the time. The main differences often reside in the domain specific information.

Thinking long term, the way segmentation is used within the localization industry is somewhat haphazard. Authoring environment has one, CMS has one, GMS has one (for memories), CAT has one, and transformation/formatting services (engines) may have yet another one and none of them are probably consistent. That, by itself, is an interesting problem to solve.

Best regards,

Helena Shih Chapman
Globalization Technologies and Architecture
+1-720-396-6323 or T/L 938-6323
Waltham, Massachusetts

From:        Yves Savourel <ysavourel@translate.com>
To:        <xliff@lists.oasis-open.org>
Date:        04/06/2011 07:42 AM
Subject:        [xliff] XLIFF 2.0 Core

To follow up on the teleconference discussion about the core.

One possible way to move forward can be to define the basic unit of extraction and build from there.

===== Segments

I would argue that the segments needs to be part of that basic structure.

The main reason for this is that if the segmentation representation is done through some optional structure, such structure would not be able to be as simple and as integrated as if it is part of the core.

--- What if the content is not "segmented"?

It's fine: Even if no segmentation process has been applied to a content, the result of the extraction of an item constitutes already a segment. The content of an extracted unit is simply made of at least one segment.

This has several advantages:

- there is no differences between accessing a segmented content or one that is not.
- any property applicable to a segment can be set at the proper level right from extraction.
- there is no reason to duplicate of the content.

If there is a need to know whether a content has been through a segmentation process or not, we could also have an attribute for this.

--- What about tools that do not handle "segments"?

Such tool would import the XLIFF data in a way that each XLIFF segment corresponds to one of the basic unit for that tool.

I suppose another option for such tool could be to re-assemble all the segments of the unit and use that as the basic unit. Segmentation change is one of the aspects that should not cause problem for the original tool to merge back the extracted text.

===== Representation

I see two possible main ways to represent this: grouping by segments or grouping by language.

All segments of the same language grouped together:

<unit id='1'>
 <seg id='1'>source segment 1</seg>
 <seg id='2'>source segment 2</seg>
 <seg id='1'>target segment 1</seg>
 <seg id='2'>target segment 2</seg>

Or all the languages of the same segment grouped together:

<unit id='1'>
 <seg id='1'>
  <source>source content 1</source>
  <target>target content 1</target>
 <seg id='2'>
  <source>source content 2</source>
  <target>target content 2</target>

They both have small advantages and drawbacks.
But the important point is the extra segment level 2.0 would introduce.


To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]