OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] The Segmentation feature


Thanks for the input Rodolfo,


> ...you may have a <unit> with a <segment> 
> containing of a paragraph with several sentences 
> and that would be a fragment "segmented" at 
> paragraph level that a user or process may 
> or may not wish to re-segment at sentence level 
> later.

Indeed.

The problem then is how do we know a content that is made of a single <segment> has gone through a (re-)segmentation process or not?

Which leads to: Why do we need to know this?

Mostly to avoid re-segmenting something already segmented.

We can have input XLIFF that has only partially gone through segmentation, or a mix of pre-segmented and un-segmented files. It may be important to be able to make the distinction.

For example, how does a conditional segmentation (i.e. segment only if it's not done yet) knows which of those two entries to segment?

<segment>
 <source>Mr. Holmes is from the U.K. not the U.S.</source>
</segment>
...
<segment>
 <source>Is Dr. Watson from there too? Yes: both are.</source>
</segment>

We don't want to re-segment the first one, but we may want to segment the second. This may be important if the next step is doing some leveraging against some TMs.

Another reason is that some input formats do have different ways to represent segmented and non-segmented entries (e.g. TTX, Trados-RTF, XLIFF1.2), and you may want to preserve the same state in the output. This is a common occurrence for us when we apply processes where re-segmentation is not involved: we want to keep things the same in the output as they were in the input.

I actually don't like that 'segmented' attribute very much either, nor the processing expectations that may go with it. But I don't see another way to know whether a single <segment> element has gone through segmentation or not.

So two questions:

- is this information worth having in XLIFF?

- and if yes, is there a better way?



> ...Tools are allowed to re-segment by splitting 
> the existing segment.

I tend to agree. But I've seen people adding extended attributes to control whether or not a file could be re-segmented. So maybe this is something of a common need?
One of the rationales I've heard is that, in some cases, the party generating the XLIFF file wants to force a specific segmentation so it is sure the party doing the translation is getting the proper translation matches. There may be other reasons.

I'm not sure if it falls into Shirley's feature of "permission control and validation"? But there is possibly a need for some flags that allow or disallow tools to perform some actions in a file. Re-segmentation is one possibility, but maybe there are more?

Cheers,
-ys




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]