[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff] Segmentation as core or not
Hi Helena, There is a confusion in terminology. Changing the element name to <part> helps in visualization but doesn’t solve the issue at hand. An XLIFF file is a container for text extracted for localization. If there isn’t text to localize, there is no XLIFF because there is nothing to Interchange (the “L” and “I” in XLIFF are failing). In many cases, the text extracted for localization needs to be further partitioned to facilitate the translation process. There are cases in which translators prefer to translate paragraphs of text because it produces better translations. In other cases (probably the majority of cases), translators prefer to translate sentences because it facilitates TM matching and translation reuse. The process of splitting extracted text into sentences is known as “segmentation”. The issue listed in the wiki related to segmentation deals with division of extracted text into “segments” and rearrangement of the segmented text when the boundaries detected by an automated process are not suitable according to the preferences of the translator. Segmentation can be done during text extraction, when the XLIFF file is created, or in a second pass after the XLIFF has been created. Segmentation also happens at translation time when translators merge or split existing segments. An XLIFF file must have containers for the extracted text. Having those containers is not a “feature”, it is a necessity. Being able to split the text and store the “segments”, “parts” or “fragments” in the same XLIFF can be viewed as a feature that may be qualified as “core” or “module”. The proposal currently in the wiki doesn’t make it easy to differentiate between text that has been “extracted” and text that has been “extracted and segmented”. If we had a clear distinction between just extracted and segmented we would be able to tell if the segmentation process and its result belongs to the “core” or “module” category. When segmentation is done while the XLIFF file is being generated, each segment can be represented as a unit for translation. That was the original way of working with XLIFF 1.0 and 1.1. In XLIFF 1.2 the notion of representing segmentation in the XLIFF document was introduced. Working with XLIFF 1.2 you can have a segmented file with each <trans-unit> containing one segment or you can have files that contain multiple segments in a <trans-unit> element, each of them enclosed in special markup designed with a combination of <seg-source> and <mrk> elements. The model for representing segmentation introduced in XLIFF 1.2 has several problems that must be fixed in XLIFF 2.0. The proposal for using <unit>, <segment> and <ignorable> that we have in current draft of the XLIFF schema allows representing segmentation. The problem with the schema is that it does not tell you if the text contained in the XLIFF file has been just extracted or extracted and segmented. The work you did with Yves in the wiki helps in understanding the status of the extracted text. With the attributes, elements and processing expectations you designed it is possible to know if the text has been segmented, if further segmentation is allowed and what restrictions apply. It’s a very nice design. The discussion is about the qualification of your work. Is it essential of is it optional? If essential, that’s a “core” feature and the used elements and attributes should be in the main XML Schema and documented as integral part of XLIFF. If representing segmentation is an optional goal, then those elements and attributes should live in a separate optional XML Schema (a “module”) and documented in an annex of the specification or in a separate guideline. In my personal opinion, representing segmentation as was designed should be a required part of the XLIFF 2.0 standard. I would call it a “core” feature. Regards, Rodolfo -- From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Helena S Chapman It almost read like what the localization industry is used to call "segment" is really a "partition". Basically something that have been cut, classified but could be further divided or broken off into finer fragments? Since I have only been involved in localization topic for the last 3-4 years, I am probably close to the un-tainted eyes.
|
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]