[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: FAQ: Criteria for choosing between the different approaches for representing inline formatting in XLIFF
Dear all, Please find below an FAQ which I have compiled from input of the XLIFF TC. Best regards, Christian ============================================================================ FAQ: Criteria for choosing between the different approaches for representing inline formatting in XLIFF XLIFF has two mechanisms to representing inline formatting of the original: 1. Abstraction: original inline markup is mapped to generic placeholder tags (<g>, <x> and <it> 2. Encapsulation: original inline markup is encapsulated in typed placeholder tags (<bpt>, <ept> and <ph>) Thus, a snippet of RTF source content with "em" as inline formatting like This is \b bold\b0. Could be represented in XLIFF in two ways: A. Via Abstraction <trans-unit id="1"> <source>This is <g id="1" ctype="bold">bold</g>.</source> </trans-unit> B. Via Encapsulation <trans-unit id="1"> <source>This is <bpt id="1" ctype="bold">\b</bpt>bold<ept id="1">\b0</ept>.</source> </trans-unit> The example already indicates two major differences between the approaches: - Abstraction provides maximum leveraging of translation memory data across incompatible resource types RTF content like "This is \b bold\b0." can be represented in the same way as HTML content like "This is <em> bold</b>. Thus, if you work with a translation memory and have already translated the RTF, you will get a good match when translating the HTML. - Abstraction generates the need to have or store information about the original format. Without this information, it will not be possible restore "This is <g id="1" ctype="bold">bold</g>." into RTF or HTML. These differences provide some hints when to choose one approach over the other. Other aspects which may need to be considered are listed below. Some of these aspects pertain to the choice between abstraction and encapsulation wherease others pertain to details within one of these approaches. Examples: When to us <ph> rather than <ept><bpt>. There is no magic recipe, the overall setting needs to be probed before making the choice. A. Export to TMX If you want the end user to export XLIFF files as TMX, then <g> and <x> are bad choices. TMX 1.4 does not support <g> and <x> tags and it is necessary to convert them to something else when exporting. Aside: This might not be an issue anymore with TMX 2.0. As the original markup is in the skeleton, it may be impossible to include the markup in the generated TMX file. B. Splitting Segments If you use <bpt>/<ept> pairs and the translator wants to split the segment, separating the tags, there are problems because it is recommanded that each <bpt> needs an <ept> in the same <source> or <target>. With two <ph> elements, you can separate them without problems in most cases. If you use <g>, it would be necessary to clone the <g> tag in the second segment and this is nasty. If you already cloned <g> element and the translator merges two segments, you end with duplicated <g> tags. This doesn't happen if you use <ph> instead. c. Source Format There are formats that don't require a skeleton, like Java Properties. In this case it is better to work with <bpt>/<ept>, <ph> and <it>. For some formats, so-called XLIFF profiles (ie. representation recommendations) have already been defined. Accordingly, you should consult the existing profiles to see if a case like your's already has been covered. D. Processing Environment <g> and <x> are appealing because it ensures that format information will not be spread across trans-units. With <bpt><ept> this is not the case. In the example below, the format begin and format end are not within one single trans-unit. This may cause trouble when the original format need to be reconstructed. This holds true for example in environments which use XSLT-based processing since challenging recursive program calls would be needed. <trans-unit id="%%%2%%%"> <source>Text 2 begins <bpt id="2" ctype="x-code" />code starts here.</source> </trans-unit> <trans-unit id="%%%3%%%"> <source>And code ends here.<ept id="2" ctype="x-code" />Now comes next TEXT.</source> </trans-unit> ============================================================================ Christian Lieske MultiLingual Technology Solutions (MLT) SAP Language Services (SLS) SAP Globalization Services SAP AG Dietmar-Hopp-Allee 16 D-69190 Walldorf Germany T +49 (62 27) 7 - 6 13 03 F +49 (62 27) 7 - 2 54 18 christian.lieske@sap.com http://www.sap.com Sitz der Gesellschaft/Registered Office: Walldorf, Germany Vorstand/SAP Executive Board: Henning Kagermann (Sprecher/CEO), Shai Agassi, Léo Apotheker, Werner Brandt, Claus Heinrich, Gerhard Oswald, Peter Zencke Vorsitzender des Aufsichtsrats/Chairperson of the SAP Supervisory Board: Hasso Plattner Registergericht/Commercial Register Mannheim No HRB 350269 Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen Dank. This e-mail may contain trade secrets or privileged, undisclosed, or otherwise confidential information. If you have received this e-mail in error, you are hereby notified that any review, copying, or distribution of it is strictly prohibited. Please inform us immediately and destroy the original transmittal. Thank you for your cooperation.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]