xliff-seg message

Subject: Segmentation sub committee proposal for main XLIFF TC
From: Andrzej Zydron <azydron@xml-intl.com>
To: xliff-seg@lists.oasis-open.org
Date: Sat, 30 Oct 2004 22:20:57 +0100
Hi Everyone,

The following revised proposal should (hopefully) encompass all of the feedback 
information from Tuesday's meeting. I have made a few small changes. I have not 
included the "segment-" prefix to the attributes, because on reflection they do 
not necessarily refer to segmentation issues. In addition is appears that the 
use of <group merged-translations="yes"> implies the use of equivalent="no" to 
all child <trans-unit> elements:

The following proposals have been prepared by the XLIFF Segmentation 
sub-committee for consideration by the main XLIFF Technical Committee:

1) A new attribute should be introduced at the <group> and <trans-unit> element 
levels that indicates that any child <target> element content is a direct 
translation of the corresponding <source> element:

equivalent (default value "yes")

The default value for this attribute is "yes" and as such the attribute can be 
omitted for all instances where the default applies. Should the attribute be set 
to "no" this indicates that the translation in any child <target> element is not 
a direct equivalent of the <source> and SHOULD NOT be loaded into translation 
memory. This attribute allows any conforming systems to exclude any text items 
from being loaded into a translation memory system if it has been indicated that 
the target text is not a direct equivalent of the source text.

Example - fixed length fields have forced the translator to place non-equivalent 
text against individual trans-units in order to display the text, but the 
individual translations are not a direct equivalent of the source text:

<trans-unit id="t.1" equivalent="no">
   <source>Constrained text for limited</source>
   <target>Tekst angielski dla</target>
</trans-unit>
<trans-unit id="t.2" equivalent="no">
   <source>display for English</source>
   <target>ograniczonego pola</target>
</trans-unit>

The translation meets the application requirements, but is not a direct 
translation of the source and should not be loaded into a leveraged translation 
memory database.

2) A new attribute should be introduced for the <group> element that indicates 
that the translation of the encompassed <trans-unit> elements must be treated as 
a whole and not as individual elements:

merged-translations (no default value, not mandatory)

Example 1) - linguistically the translation only makes sense if the text within 
the <group> element is taken as a whole:

<group merged-translations="yes">
   <trans-unit id="1" equivalent="false">
     <source>The text goes on,</source>
     <target>Texten ga*r vidare</target>
   </trans-unit>
   <trans-unit id="2" equivalent="false">
     <source>and on, and on, and on.</source>
     <target>och vidare, och vidare.</target>
   </trans-unit>
</group>

Example 2) - incorrect segmentation requires that the translation be taken as a 
whole:

<group merged-translations="yes">
   <trans-unit id="t1" equivalent="false">
     <source>The German acronym v.</source>
     <target>Niemiecki skrót v. OT oznacza górną pozycję silnika.</target>
   </trans-unit>
   <trans-unit id="t2" equivalent="false">
     <source>OT signifies the top dead center position for an engine.</source>
     <target/>
   </trans-unit>
</group>

The use of the merged-translations="yes" attribute at the group level implies 
that any child <trans-unit> elements should have the "equivalent" attribute set 
to "no".

3) Segmentation for "unprocessed" XLIFF files. Where an XLIFF file has been 
created by a filter, where no segmentation has been applied to the individual 
<source> elements the XLIFF file can be considered as a normal XML file where 
the <target> elements constitute text that may be segmented.

The XLIFF file target elements, which at the time contain the source text, can 
have segmentation information added by means of a segmentation namespace such as 
xml:tm using SRX rules. A normal XML XLIFF extraction can then be executed on 
the file using either an XSLT transformation, or program. The resultant skeleton 
file will enable the translated text to be merged with the original XLIFF 
document. An XSLT transformation can then be used to strip out the segmentation 
namespace, resulting in a "translated" original unsegmented XLIFF file. This 
solution is ideal for a production process that can handle pipeline 
transformations and where the XLIFF document constitutes raw, unprocessed 
extracted text.

This solution is not necessarily suited to interactive segmentation that is 
being executed within a user interface centered environment, nor where the XLIFF 
file has already had some form of translation memory matching applied to it. The 
XLIFF segmentation sub-committee will continue trying to reach a solution for 
these types of environment.

Best Regards,

AZ

-- 


email - azydron@xml-intl.com
smail - c/o Mr. A.Zydron
	PO Box 2167
         Gerrards Cross
         Bucks SL9 8XF
	United Kingdom
Mobile +(44) 7966 477 181
FAX    +(44) 1753 480 465
www - http://www.xml-intl.com

This message contains confidential information and is intended only
for the individual named.  If you are not the named addressee you
may not disseminate, distribute or copy this e-mail.  Please
notify the sender immediately by e-mail if you have received this
e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or error-free
as information could be intercepted, corrupted, lost, destroyed,
arrive late or incomplete, or contain viruses.  The sender therefore
does not accept liability for any errors or omissions in the contents
of this message which arise as a result of e-mail transmission.  If
verification is required please request a hard-copy version. Unless
explicitly stated otherwise this message is provided for informational
purposes only and should not be construed as a solicitation or offer.