xliff message

Subject: Fwd: Handling escaped characters in Translation Units
From: Paul Gampe <pgampe@redhat.com>
To: xliff@lists.oasis-open.org
Date: Mon, 23 May 2005 12:28:46 +1000
Dear TC, the xliff-tools project, would greatly appreciate your insight on the 
following problem they have been discussing:

----------  Forwarded Message  ----------

Subject: Handling escaped characters in Translation Units
Date: Tuesday 10 May 2005 16:52
From: Asgeir Frimannsson <asgeirf@redhat.com>
To: Paul Gampe <pgampe@redhat.com>
Cc: Jim Hogan <j.hogan@qut.edu.au>

Hi Paul,

Here's an issue we've been discussing up and down on the xliff-tools
 mailing-list, - a discussion initiated by Yves Savourel last week. I believe
 this is an issue that needs a reccommended approach by the XLIFF TC. Let me
 know what you think :)

Handling Escaped Characters in Translation Units

In source code, it is very common to use escape characters for characters
 like newline (\u000A) and horizontal tab (\u0009).

For example:

printf("Please Enter the following Data:\n\
\t- First Name\n\
\t- Last Name\n");

Here we've used the escape characters '\n' and '\t' representing newlines and
 tabs.

This fragment would be represented in PO as follows:

msgid ""
"Please Enter the following Data:\n"
"\t- First Name\n"
"\t- Last Name\n"


This could be mapped to XLIFF using two different approaches:

Approach A:

We could preserve the escaped characters:

<source>Please Enter the following Data:\n\t- First Name\n\
\t- First Name\n\t- Last Name\n</source>

We could further enhance this by abstracting the escaped characters to <ph>
 elements:

<source>Please Enter the following Data:<ph id='1' ctype='lb'>\n</ph>\
<ph id='2' ctype='x-ht'>\t</ph>- First Name<ph id='3' ctype='lb'>\n</ph>\
<ph id='4' ctype='x-ht'>\t</ph>- First Name<ph id='5' ctype='lb'>\n</ph>\
</source>

Issue A-1: If using this approach, would filters have to discard real newline
 characters (\u000A) in translation units? How would this affect TM lookups?

Issue A-2: How would editors handle this approach? For software messages,
 they would have to disable entering newlines, and in some way format the
 message after the value of the ctype attributes? (Not having visual
 indicators for e.g. newlines would not be a very
 translator-useability-friendly approach).

Issue A-3: Where do we stop? In Java .properties files we usually add a
 "\u0020" to indicate a leading space, For example:

my_message = \u0020Some Text

Should this be represented as:

<source>\u0020Some Text</source>
or
<source> Some Text</source>
?

Approach B:

Many of the escaped characters have native unicode values we could use in
 XLIFF. We could replace '\t' with a real TAB (\u0009) character, and similar
 with other escape characters, giving us the following XLIFF fragment:

<source>Please Enter the following Data:
	- First Name
	- Last Name
</source>

Issue B-1: DOS/Windows use "\r\n", while UNIX (and most programming
 languages) use "\n" as line endings. How would we on back-conversion know if
 we should write "\n" or "\r\n" in the translated source file.

Issue B-2: There are some escape characters used in PO (and probably other
 source formats?) that XML does not allow. For example the "\b" (\u0007, the
 Alert or Bell control character). How should these be handled? (Yes, asking
 the developer what that character is doing in a localised message is a good
 start)

Conclusion

It would be good to have a reccommended approach for handling this, which all
 representation guides could share.

The full archived discussion on this, is available at:
http://lists.freedesktop.org/archives/xliff-tools/2005-May/000169.html

cheers,
asgeir

-------------------------------------------------------
Follow-Ups:
- RE: [xliff] Fwd: Handling escaped characters in Translation Units
  - From: "Doug Domeny" <ddomeny@ektron.com>