xliff-inline message

Subject: RE: [xliff-inline] Req 1.15 Representation of invalid XML characters

From: Yves Savourel <ysavourel@enlaso.com>
To: <xliff-inline@lists.oasis-open.org>
Date: Wed, 24 Aug 2011 06:34:07 +0200

Hi Christian,

Comments below.


CL> How about the following?
CL> Unfortunately, XML does not have the capability to contain 
CL> all Unicode code points. Due to this, in certain instances 
CL> extra syntax is required to represent those code points that 
CL> cannot be otherwise represented in element content. These 
CL> escapes are only allowed in certain elements, according to 
CL> the DTD. (from  http://unicode.org/reports/tr35/#Escaping_Characters).
CL> Writers MUST represent these code points of the inline content 
CL> using the LDML representation (e.g. <cp hex="0">).

I disagree: In XLIFF <cp> is an XLIFF representation, not an LDML one.
We can certainly point to the source of inspiration, but we also want to take ownership of the element in the XLIFF context. 


YS> - Readers MUST process all <cp> elements regardless 
YS> whether their hex value is a valid or invalid XML 
YS> code points.
>
CL> How can we define "process"?
>
YS> Maybe interpret, or convert would be better (more specific).
>
CL> How about the following?
CL> Readers must preserve the content of "cp" elements.

There is no "content" in cp as it's an empty element :)
And I think "preserve" may be confusing as it may be seen related to writing things out after processing.

Here we are just saying that all cp element must be processed. That is: even if the value of "hex" may not corresponds to an invalid character it should be read and converted into whatever the parsed content representation is for that specific reader.

Maybe: "Readers MUST read all <cp>..."? But "process" sounds better to me because it implies some kind of transformation.


YS> ...But then, this prevents tools to catch several 
YS> errors in one go...
>
CL> How about the following?
CL> If the value of the hex attribute is invalid, the Readers 
CL> MUST continue in a "detect additional errors" mode 
CL> (to gather a list of all errors). In the end, the Readers
CL> MUST generate an error, MUST terminate the process, 
CL> and must point to logging information (for the errors).

I don't think we should force a reader to continue after it finds an error.
We certainly should allow it to continue to gather more errors if it feels like it, but not make it mandatory.
Also some readers may have no logging mechanism. We should probably stick to general terms when it comes error handling, like "generate an error".

Maybe: "If the value of the hex attribute is invalid, the Readers MUST generate an error and MAY terminate the process. This specification does not prescribe how invalid <cp> values are represented in the parsed content."

But I still think it would be better to have an expected behavior: it helps interoperability. U+FFFD seems to be applicable for such case according to http://en.wikipedia.org/wiki/Replacement_character#Replacement_character).

Cheers,
-yves

References:
- RE: [xliff-inline] Req 1.15 Representation of invalid XML characters
  - From: "Lieske, Christian" <christian.lieske@sap.com>
- RE: [xliff-inline] Req 1.15 Representation of invalid XML characters
  - From: "Lieske, Christian" <christian.lieske@sap.com>