xliff message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: Re: [xliff] XLIFF 2.0 Core
- From: Helena S Chapman <hchapman@us.ibm.com>
- To: Yves Savourel <ysavourel@translate.com>, <xliff@lists.oasis-open.org>
- Date: Wed, 6 Apr 2011 09:49:37 -0400
I agree the reference to the definition
of segment should be mandatory. And, I prefer this done by language for
simplicity. In reality, if I pick up a piece of English content (forget
translation for a minute), the way I segment the content would probably
be the same as how you would do it about 80% of the time. The main differences
often reside in the domain specific information.
Thinking long term, the way segmentation
is used within the localization industry is somewhat haphazard. Authoring
environment has one, CMS has one, GMS has one (for memories), CAT has one,
and transformation/formatting services (engines) may have yet another one
and none of them are probably consistent. That, by itself, is an interesting
problem to solve.
Best regards,
Helena Shih Chapman
Globalization Technologies and Architecture
+1-720-396-6323 or T/L 938-6323
Waltham, Massachusetts
From:
Yves Savourel <ysavourel@translate.com>
To:
<xliff@lists.oasis-open.org>
Date:
04/06/2011 07:42 AM
Subject:
[xliff] XLIFF
2.0 Core
To follow up on the teleconference discussion about
the core.
One possible way to move forward can be to define the basic unit of extraction
and build from there.
===== Segments
I would argue that the segments needs to be part of that basic structure.
The main reason for this is that if the segmentation representation is
done through some optional structure, such structure would not be able
to be as simple and as integrated as if it is part of the core.
--- What if the content is not "segmented"?
It's fine: Even if no segmentation process has been applied to a content,
the result of the extraction of an item constitutes already a segment.
The content of an extracted unit is simply made of at least one segment.
This has several advantages:
- there is no differences between accessing a segmented content or one
that is not.
- any property applicable to a segment can be set at the proper level right
from extraction.
- there is no reason to duplicate of the content.
If there is a need to know whether a content has been through a segmentation
process or not, we could also have an attribute for this.
--- What about tools that do not handle "segments"?
Such tool would import the XLIFF data in a way that each XLIFF segment
corresponds to one of the basic unit for that tool.
I suppose another option for such tool could be to re-assemble all the
segments of the unit and use that as the basic unit. Segmentation change
is one of the aspects that should not cause problem for the original tool
to merge back the extracted text.
===== Representation
I see two possible main ways to represent this: grouping by segments or
grouping by language.
All segments of the same language grouped together:
<unit id='1'>
<source>
<seg id='1'>source segment 1</seg>
<seg id='2'>source segment 2</seg>
</source>
<target>
<seg id='1'>target segment 1</seg>
<seg id='2'>target segment 2</seg>
</target>
</unit>
Or all the languages of the same segment grouped together:
<unit id='1'>
<seg id='1'>
<source>source content 1</source>
<target>target content 1</target>
</seg>
<seg id='2'>
<source>source content 2</source>
<target>target content 2</target>
</seg>
</unit>
They both have small advantages and drawbacks.
But the important point is the extra segment level 2.0 would introduce.
Cheers,
-ys
---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]