RE: [xliff] Segmentation as core or not

-------- Original Message --------
Subject: RE: [xliff] Segmentation as core or not
From: Helena S Chapman <hchapman@us.ibm.com>
Date: Tue, November 08, 2011 1:12 pm
To: "Rodolfo M. Raya" <rmraya@maxprograms.com>
Cc: xliff@lists.oasis-open.org

Rodolfo. You brought up an interesting point "To apply segmentation process to an already existing XLIFF file is an optional task. Recording that such task has been performed is the optional part. For the process to be possible, the text must already be in the XLIFF file and it has to be in some containers. "

I believe we are talking about two very distinct process activities here: 1. partition content into parts (core) 2. refine the definition of #1 into segments (module)

I agree any existing XLIFF file will already include "parts" of content. How these parts were defined by what tools is something the module can then define. For example, one might expect metadata about what the parts mean according to other standard or non-standard definition. For example, word vs sentence according to UAX#29 or paragraph vs chapter based on Acme Translation Agency Inc. internal definition? The latter is what Steven is referring to as logging.

We definitely should rethink the taxonomy of what we call "segmentation" today. Note that I didn't use the word "terminology" to further pollute the conversation.

Best regards,

Helena Shih Chapman
Globalization Technologies and Architecture
+1-720-396-6323 or T/L 938-6323
Waltham, Massachusetts

From: "Rodolfo M. Raya" <rmraya@maxprograms.com>
To: <xliff@lists.oasis-open.org>
Date: 11/08/2011 04:37 AM
Subject: RE: [xliff] Segmentation as core or not
Sent by: <xliff@lists.oasis-open.org>

Hi, I think there is a huge confusion between the segmentation process and storing segments in XLIFF. Text extracted for translation and stored in an XLIFF file needs to be stored in some elements that act as containers. If XLIFF doesn't have containers for holding localizable text, then the localizable text can't be exchanged and the "L" and "I" fail in the XLIFF acronym. Extracted text can be segmented before the XLIFF file is created (my tools have been doing this for years) or after the XLIFF has been created. A tool processing XLIFF files should not care about when segmentation was done. More, the segmentation process is completely optional. To apply segmentation process to an already existing XLIFF file is an optional task. Recording that such task has been performed is the optional part. For the process to be possible, the text must already be in the XLIFF file and it has to be in some containers. Storing translatable text in XLIFF files is not optional. Elements for holding that text are required and elements for holding the translations of that text are also an integral part of XLIFF. What we have so far in the XLIFF schema draft is a set of elements and attributes for holding translatable text and its translations. In the schema we don't have information that indicates how and when segmentation process occurred. In the wiki we have a proposal for decorating current schema draft with elements and attributes containing information about the segmentation process. The proposal in the wiki augments the scope of the basic elements already present in the schema draft by adding attributes and processing expectations to elements that must be present in any XLIFF file. Although some attributes mentioned in the segmentation section in the wiki are not really necessary when an XLIFF file is created, the elements in which they appear are absolutely necessary. We can't document an element as part of the "core" schema and leave some of its attributes as optional in a separate "module". Minimalism is a fancy trend. I like it very much and see it useful in some cases. We should not try to apply minimalism to the concept of XLIFF core; this would be a mistake as big as the mistake in XLIFF 1.2 that enabled custom extensions everywhere. Balance is important. Regards, Rodolfo -- Rodolfo M. Raya rmraya@maxprograms.com Maxprograms http://www.maxprograms.com> -----Original Message----- > From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf > Of Yves Savourel > Sent: Tuesday, November 08, 2011 3:24 AM > To: xliff@lists.oasis-open.org > Subject: RE: [xliff] Segmentation as core or not > > Hi Steven, all, > > > We discussed this a little bit in IBM today. > > Our view would still be that segmentation does not need to be in core > > for interchange. > > I think most (all hopefully) of us would probably agree that one important > criteria for an optional module is that it does not prevent the tools > implementing only the core to work properly. > > So if the representation of sentence-segmentation is optional it should not > prevent a tool XYZ, which understands only the core elements, to work. > > The question then is how does tool XYZ can work with a sentence- > segmented file without knowing about <segment>? > > <unit id='1'> > <segment> > <source>Sentence one. </source> > </segment> > <segment> > <source>Sentence two.</source> > </segment> > </unit> > > I don't think it can. > > The only way it could, would be if a unit was to store two copies of the same > content: one not sentence-segmented, and the other one reserved for the > tools that would implement the optional segmentation representation > module. > > Needless to say this would result in a slew of troubles: Where does tool ABC > (which implements segmentation) puts its translation? How tools XYZ (which > does not implement segmentation) can access it? How do we resolve > difference in source? Where do we put segment status? etc. Basically it's all > the problems of 1.2 all over again. In 1.2 we had no choice because we > needed to be backward compatible. But 2.0 we can have a clean way of > dealing with segments. > > So far, the only rationale I've heard for making <segment> optional, is the > argument that segmentation is a different process and therefore should not > be part of the core. But I think we have seen that segmentation in general is > broader than sentence-segmentation and clearly happens also during > extraction (see the example with ITS <withinTextRule/>), so that rationale > doesn't really hold true. > > But maybe I'm missing other things: what are the advantages of keeping the > segmentation representation optional? > > Cheers, > -yves > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: xliff-help@lists.oasis-open.org --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org

xliff message