OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [xliff] Segmentation as core or not

It almost read like what the localization industry is used to call "segment" is really a "partition". Basically something that have been cut, classified but could be further divided or broken off into finer fragments? Since I have only been involved in localization topic for the last 3-4 years, I am probably close to the un-tainted eyes.

To me, a segment in the localization world is something that usually have something to do with payment. That is, even if one is paying a service by words, the cost of each word can still be determined by the complexity of a segment. (e.g. length etc.)

From:        Yves Savourel <ysavourel@enlaso.com>
To:        Helena S Chapman/San Jose/IBM@IBMUS
Cc:        <xliff@lists.oasis-open.org>
Date:        11/01/2011 11:02 PM
Subject:        RE: [xliff] Segmentation as core or not

Hi Helena,
I guess theoretically it would be possible to have an entire chapter in one “part”. But the extraction tools would not likely do that. Even when there is no sentence-based segmentation the extractors do break down the content into much smaller parts; typically the equivalent of paragraphs for document-type files, or strings for UI-type file.
Actually quite a few tools, especially for software, don’t go beyond that type of segmentation. If you look at many tools for PO files, or Java properties files for examples: Their entries are not often sentence-segmented. And they create TMX files where the entries are called “segments”.
Others may correct me, but I think calling those extracted parts “segments” is simply a relatively common practice.
Personally I think the important thing is to be very clear on what those “part” are, regardless how we end up calling the elements. That said we should obviously pick a name that is not too confusing.
It seems “segment” has been used for a while to mean both the container of something un-segmented and segmented (see for example TMX’s <seg>), but maybe I’ve been too deep in TMX/XLIFF/etc. for too long to see the world with un-tainted eyes :)
Hope this helps,
From: Helena S Chapman [mailto:hchapman@us.ibm.com]
Tuesday, November 01, 2011 7:52 PM
Yves Savourel
Re: [xliff] Segmentation as core or not

Yves, I want to make sure I understand your view point. Based on what you suggested, it is possible for one to have an entire chapter or book as a single *part* when pass it around in an XLIFF file? If so, why call it a segment?

<unit id='1'>
<source>Sentence one. Sentence two. Sentence three. .... Sentence two thousand and forty five.</source>

Best regards,

Helena Shih Chapman
Globalization Technologies and Architecture
+1-720-396-6323 or T/L 938-6323
Waltham, Massachusetts

Yves Savourel <ysavourel@enlaso.com>
11/01/2011 04:56 PM
[xliff] Segmentation as core or not
Sent by:        

Hi all,

To continue on the discussion whether the "segmentation" feature is core or not:

I think Dave has an obviously valid point when saying that segmentation is not necessarily done at the time of the extraction, and therefore we could have un-segmented XLIFF.

But to me a "segment" is not necessarily the result of a segmentation process it can be a "block" extracted from the original format (as our definition states:
So each un-segmented entry is, by nature a segment, that simply contains potentially several sentences.

Maybe things would more clear if we think about the element <segment> as a "part" rather than a "segment"? The Segmentation representation addresses how to organize and manipulate such parts.

<unit id='1'>
<source>Sentence one. Sentence two.</source>

<unit id='1'>
<source>Sentence one. </source>
<source> Sentence two.</source>

Maybe, viewed from that angle it's more clear that such element needs to be part of the core?


To unsubscribe, e-mail:
For additional commands, e-mail:

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]