xliff message

Subject: FAQ: Criteria for choosing between the different approaches for representing inline formatting in XLIFF
From: "Lieske, Christian" <christian.lieske@sap.com>
To: <xliff@lists.oasis-open.org>
Date: Fri, 16 Mar 2007 09:44:38 +0100
Dear all, 

Please find below an FAQ which I have compiled from input of the XLIFF TC.

Best regards,
Christian

============================================================================

FAQ: Criteria for choosing between the different approaches for representing
inline formatting in XLIFF

XLIFF has two mechanisms to representing inline formatting of the original:

1. Abstraction: original inline markup is mapped to generic placeholder tags
(<g>, <x> and <it>
2. Encapsulation: original inline markup is encapsulated in typed
placeholder tags (<bpt>, <ept> and <ph>)

Thus, a snippet of RTF source content with "em" as inline formatting like

	This is \b bold\b0.

Could be represented in XLIFF in two ways:

A. Via Abstraction

<trans-unit id="1">
 <source>This is <g id="1" ctype="bold">bold</g>.</source>
</trans-unit>

B. Via Encapsulation

<trans-unit id="1">
 <source>This is <bpt id="1" ctype="bold">\b</bpt>bold<ept
id="1">\b0</ept>.</source>
</trans-unit>

The example already indicates two major differences between the approaches:

- Abstraction provides maximum leveraging of translation memory data across
incompatible resource types

  RTF content like "This is \b bold\b0." can be represented in the same way
as HTML content like "This is <em>  bold</b>. Thus, if you work with a
translation memory and have already translated the RTF, you will get a good
match when translating the HTML.

- Abstraction generates the need to have or store information about the
original format.

  Without this information, it will not be possible restore "This is <g
id="1" ctype="bold">bold</g>." into RTF or  HTML.

These differences provide some hints when to choose one approach over the
other. Other aspects which may need to  be considered are listed below.

Some of these aspects pertain to the choice between abstraction and
encapsulation wherease others pertain to  details within one of these
approaches. Examples: When to us <ph> rather than <ept><bpt>.

There is no magic recipe, the overall setting needs to be probed before
making the choice. 

A. Export to TMX

If you want the end user to export XLIFF files as TMX, then <g> and <x> are
bad choices. TMX 1.4 does not support  <g> and <x> tags and it is necessary
to convert them to something else when exporting. Aside: This might not be
an  issue anymore with TMX 2.0.

As the original markup is in the skeleton, it may be impossible to include
the markup in the generated TMX file.

B. Splitting Segments

If you use <bpt>/<ept> pairs and the translator wants to split the segment,
separating the tags, there are  problems because it is recommanded that each
<bpt> needs an <ept> in the same <source> or <target>. With two <ph>
elements, you can separate them without problems in most cases. 

If you use <g>, it would be necessary to clone the <g> tag in the second
segment and this is nasty. If you already  cloned <g> element and the
translator merges two segments, you end with duplicated <g> tags. This
doesn't happen  if you use <ph> instead. 

c. Source Format

There are formats that don't require a skeleton, like Java Properties. In
this case it is better to
work with <bpt>/<ept>, <ph> and <it>.

For some formats, so-called XLIFF profiles (ie. representation
recommendations) have already been defined. Accordingly, you should consult
the existing profiles to see if a case like your's already has been covered.

D. Processing Environment

<g> and <x> are appealing because it ensures that format information will
not be spread across trans-units. With <bpt><ept> this is not the case. In
the example below, the format begin and format end are not within one single
trans-unit. This may cause trouble when the original format need to be
reconstructed. This holds true for example in environments which use
XSLT-based processing since challenging recursive program calls would be
needed.

<trans-unit id="%%%2%%%">
 <source>Text 2 begins <bpt id="2" ctype="x-code" />code starts
here.</source>
</trans-unit>
<trans-unit id="%%%3%%%">
 <source>And code ends here.<ept id="2" ctype="x-code" />Now comes next
TEXT.</source>
</trans-unit>

============================================================================

Christian Lieske
MultiLingual Technology Solutions (MLT)
SAP Language Services (SLS)
SAP Globalization Services
SAP AG
Dietmar-Hopp-Allee 16
D-69190 Walldorf
Germany
T   +49 (62 27) 7 - 6 13 03
F   +49 (62 27) 7 - 2 54 18
christian.lieske@sap.com
http://www.sap.com

Sitz der Gesellschaft/Registered Office: Walldorf, Germany
Vorstand/SAP Executive Board: Henning Kagermann (Sprecher/CEO), Shai Agassi,
Léo Apotheker, Werner Brandt, Claus Heinrich, Gerhard Oswald, Peter Zencke
Vorsitzender des Aufsichtsrats/Chairperson of the SAP Supervisory Board:
Hasso Plattner
Registergericht/Commercial Register Mannheim No HRB 350269

Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige
vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich
erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine
Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte
benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen
Dank.

This e-mail may contain trade secrets or privileged, undisclosed, or
otherwise confidential information. If you have received this e-mail in
error, you are hereby notified that any review, copying, or distribution of
it is strictly prohibited. Please inform us immediately and destroy the
original transmittal. Thank you for your cooperation.
smime.p7s