[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Segmentation and filters
Hi, First of all I would like to thank Magnus for the hard work he has put in so far and the detailed document that he has prepared. This has provided a clear starting point for further discussions. To kick off this thread I would like to state my views on the segmentation issue: 1) Segmentation within XLIFF should not be mandated. It should be optional. There are implementations such as xml:tm where segmentation is done before extraction. It is also quite easy to envisage situations where XLIFF is the output of an existing translation workbench system that has already segmented and pre-matched data for sending out to a translator who will import it into an XLIFF aware editing environment. I can also see Magnus' point that quite often XLIFF will contain unsegmented data. One solution would be to provide an optional "segmented" attribute at the <file> element level which states that the data has already been segmented, with a default value of "false". If the data has been segmented than an xlink attribute to the SRX url could also be provided. 2) One way of handling segmentation within XLIFF is to create a secondary XLIFF document from the current XLIFF document that has a separate <trans-unit> element for each segment. This would effectively be an segmentation extraction of the original XLIFF document. This has the one significant advantage that no further extensions are required to the XLIFF standard. It does away with all the potential complexity of trying to nest <trans-unit> elements or add workable syntax to cope with multiple source and target segments within a <trans-unit>. Because XLIFF is a well defined XML format it is very easy to write an extraction + segmentation filter for it to provide an XLIFF file where the <trans-unit> elements are at the segment level, along with a skeleton file for merging back. After translation you can elect to store leveraged memory at both the segmented and unsegmeted levels. Here is an example based on Magnus' data: Step 1: Original XLIFF file: <body> <trans-unit id="1"> <source xml:lang="en-US">The Document Title</source> </trans-unit> <trans-unit id="2"> <source xml:lang="en-US">First sentence. <bpt id="1">[ITALIC:</bpt>This is an important sentence.<ept id="1">]</ept></source> </trans-unit> <trans-unit id="3"> <source xml:lang="en-US">Ambiguous sentence. More <bpt id="1">[LINK-to-toc:</bpt>content<ept id="1">]</ept>.</source> </trans-unit> </body> Step 2: Introduce namespace segmentation into XLIFF file <body xmlns:tm="http://www.xml-intl.com/dtd/tm.xsd"> <trans-unit id="1"> <source xml:lang="en-US"><tm:tu id="1.1">The Document Title</tm:tu></source> </trans-unit> <trans-unit id="2"> <source xml:lang="en-US"><tm:tu id="2.1">First sentence.</tm:tu> <bpt id="1">[ITALIC:</bpt><tm:tu id="2.2">This is an important sentence.</tm:tu><ept id="1">]</ept></source> </trans-unit> <trans-unit id="3"> <source xml:lang="en-US"><tm:tu id="3.1">Ambiguous sentence.</tm:tu> <tm:tu id="3.2">More <bpt id="1">[LINK-to-toc:</bpt>content<ept id="1">]</ept>.</tm:tu></source> </trans-unit> </body> Step 3: Using a simple XSLT transformation create new segmented XLIFF file: <body segmented="true" srx="http://www.xml-intl.com/srx/en-US.srx"> <trans-unit id="1.1"> <source xml:lang="en-US">The Document Title</source> </trans-unit> <trans-unit id="2.1"> <source xml:lang="en-US">First sentence.</source> </trans-unit> <trans-unit id="2.2"> <source xml:lang="en-US">This is an important sentence.</source> </trans-unit> <trans-unit id="3.1"> <source xml:lang="en-US">Ambiguous sentence.</source> </trans-unit> <trans-unit id="3.1"> <source xml:lang="en-US">More <bpt id="1">[LINK-to-toc:</bpt>content<ept id="1">]</ept>.</source> </trans-unit> </body> And Skeleton file: <body xmlns:tm="http://www.xml-intl.com/dtd/tm.xsd"> <trans-unit id="1"> <source xml:lang="en-US"><tm:tu id="1.1"><ext id="1.1"/></tm:tu></source> </trans-unit> <trans-unit id="2"> <source xml:lang="en-US"><tm:tu id="2.1"><ext id="2.1"/></tm:tu> <bpt id="1">[ITALIC:</bpt><tm:tu id="2.2"><ext id="2.2"/></tm:tu><ept id="1">]</ept></source> </trans-unit> <trans-unit id="3"> <source xml:lang="en-US"><tm:tu id="3.1"><ext id="3.1"/></tm:tu> <tm:tu id="3.2"><ext id="3.2"/></tm:tu></source> </trans-unit> </body> Step 3: Put segmented XLIFF file through whatever matching process you want to, to produce: <body segmented="true" srx="http://www.xml-intl.com/srx/en-US.srx"> <trans-unit id="1.1"> <source xml:lang="en-US">The Document Title</source> <target xml:lang="sv-SE" state="translated" state-qualifier="leveraged-tm">Dokumentrubriken</target> </trans-unit> <trans-unit id="2.1"> <source xml:lang="en-US">First sentence.</source> <target xml:lang="sv-SE" state="translated" state-qualifier="leveraged-tm">Första meningen.</target> </trans-unit> <trans-unit id="2.2"> <source xml:lang="en-US">This is an important sentence.</source> <alt-trans origin="transation memory" match-quality="80%"> <source xml:lang="en-US">This is an extremely important sentence.</source> <target xml:lang="sv-SE">En mycket viktig mening.</target> </alt-trans> </trans-unit> <trans-unit id="3.1"> <source xml:lang="en-US">Ambiguous sentence.</source> <target xml:lang="sv-SE" state="needs-review-translation">Omstridd mening.</target> <note annotates="target" from="Swedish Translator">This translation may not be appropriate. Please evaluate it carefully!</note> </trans-unit> <trans-unit id="3.1"> <source xml:lang="en-US">More <bpt id="1">[LINK-to-toc:</bpt>content<ept id="1">]</ept>.</source> <taget xml:lang="sv-SE" state="translated">Ytterligare <bpt id="1">[LINK-to-toc:</bpt>inneha*ll<ept id="1">]</ept>.</target> </trans-unit> </body> Step 4: Using nothing more than XSLT, merge the translated document back, then strip out the segmented namespace elements using another simple XSLT transformation and you arrive at a translated XLIFF file that is equal to the original source language unsegmented file. This approach has the benefit of requiring minimal or possibly no change to the existing excellent XLIFF specification. Hope this helps kick off the thread. Regards, AZ -- email - azydron@xml-intl.com smail - Mr. A.Zydron 24 Maybrook Gardens, High Wycombe, Bucks HP13 6PJ Mobile +(44) 7966 477181 FAX +(44) 870 831 8868 www - http://www.xml-intl.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you may not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. Unless explicitly stated otherwise this message is provided for informational purposes only and should not be construed as a solicitation or offer.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]