OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: Preferred method of representing invalid XML chars in <source>?

Kristian Walsh,

I've not heard of invalid characters in an XML document. Usually,
non-printable characters are represented by their character reference. I
frequently use '&#13;&#10;' to represent CR/LF.

For example,

<trans-unit id="a920cf">
	<source xml:lang="en">Three tabs follow&#08;&#08;&#08; then the text


Doug Domeny
Software Analyst

Ektron, Inc.
+1 603 594-0249 x212

-----Original Message-----
From: Kristian Walsh [mailto:listreader@byteform.com]
Sent: Monday, July 26, 2004 6:28 AM
To: xliff-comment@lists.oasis-open.org
Subject: Preferred method of representing invalid XML chars in <source>?


I am developing an application which creates XLIFF 1.0 documents from
source data. Unfortunately, sometimes this source data contains
character codes below U+0020, which are invalid in an XML document.

I am unsure of the "canonical" way to deal with this in XLIFF 1.0
(version 1.1 is not an option for this application); as far as I can
see, <x/>, <g/> and <ph> can all be used for this purpose, as below:

Form 1: <x>

<trans-unit id="a920cf">
	<source xml:lang="en">Three tabs follow<x id="a920d0"
ctype="character" clone="yes" ts="MyTool:chars=0008,0008,0008"> then
the text continues</source>

Form 2: <g>

<trans-unit id="a920cf">
	<source xml:lang="en">Three tabs follow<g id="a920d0"
ctype="character" clone="yes" ts="MyTool:chars">0008,0008,0008</g> then
the text continues</source>

Form 3: <ph>

<trans-unit id="a920cf">
	<source xml:lang="en">Three tabs follow<ph id="a920d0"
ctype="character" ts="MyTool:chars">0008,0008,0008</ph> then the text

So my two questions are:

  1. Which of the above forms is preferred in XLIFF 1.0 for representing
non-XML characters inside source (and/or target) data?

  2. Is there a standard ctype attribute value for "raw character codes"?

Any ideas would be greatly appreciated,

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]