OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff] Segmentation as core or not


Hi Steven, all,

> We discussed this a little bit in IBM today.
> Our view would still be that segmentation 
> does not need to be in core for interchange.

I think most (all hopefully) of us would probably agree that one important criteria for an optional module is  that it does not prevent the tools implementing only the core to work properly.

So if the representation of sentence-segmentation is optional it should not prevent a tool XYZ, which understands only the core elements, to work.

The question then is how does tool XYZ can work with a sentence-segmented file without knowing about <segment>?

<unit id='1'>
 <segment>
  <source>Sentence one. </source>
 </segment>
 <segment>
  <source>Sentence two.</source>
 </segment>
</unit>

I don't think it can.

The only way it could, would be if a unit was to store two copies of the same content: one not sentence-segmented, and the other one reserved for the tools that would implement the optional segmentation representation module.

Needless to say this would result in a slew of troubles: Where does tool ABC (which implements segmentation) puts its translation? How tools XYZ (which does not implement segmentation) can access it? How do we resolve difference in source? Where do we put segment status? etc. Basically it's all the problems of 1.2 all over again. In 1.2 we had no choice because we needed to be backward compatible. But 2.0 we can have a clean way of dealing with segments.

So far, the only rationale I've heard for making <segment> optional, is the argument that segmentation is a different process and therefore should not be part of the core. But I think we have seen that segmentation in general is broader than sentence-segmentation and clearly happens also during extraction (see the example with ITS <withinTextRule/>), so that rationale doesn't really hold true.

But maybe I'm missing other things: what are the advantages of keeping the segmentation representation optional?

Cheers,
-yves





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]