OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xliff-inline message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [xliff-inline] Req 1.15 Representation of invalid XML characters

Hi Yves,

Comments to comments below ... I used the CLCL> marker


-----Original Message-----
From: Yves Savourel [mailto:ysavourel@enlaso.com] 
Sent: Mittwoch, 24. August 2011 06:34
To: xliff-inline@lists.oasis-open.org
Subject: RE: [xliff-inline] Req 1.15 Representation of invalid XML characters

Hi Christian,

Comments below.

CL> How about the following?
CL> Unfortunately, XML does not have the capability to contain 
CL> all Unicode code points. Due to this, in certain instances 
CL> extra syntax is required to represent those code points that 
CL> cannot be otherwise represented in element content. These 
CL> escapes are only allowed in certain elements, according to 
CL> the DTD. (from  http://unicode.org/reports/tr35/#Escaping_Characters).
CL> Writers MUST represent these code points of the inline content 
CL> using the LDML representation (e.g. <cp hex="0">).

I disagree: In XLIFF <cp> is an XLIFF representation, not an LDML one.
We can certainly point to the source of inspiration, but we also want to take ownership of the element in the XLIFF context. 

CLCL> I think this merits additional discussion. I could see advantages to use elements from the LDML namespace in the generic inline markup (or XLIFF).
CLCL> Would for example be in line with the "do not reinvent the wheel" mantra.

YS> - Readers MUST process all <cp> elements regardless 
YS> whether their hex value is a valid or invalid XML 
YS> code points.
CL> How can we define "process"?
YS> Maybe interpret, or convert would be better (more specific).
CL> How about the following?
CL> Readers must preserve the content of "cp" elements.

There is no "content" in cp as it's an empty element :)
And I think "preserve" may be confusing as it may be seen related to writing things out after processing.

Here we are just saying that all cp element must be processed. That is: even if the value of "hex" may not corresponds to an invalid character it should be read and converted into whatever the parsed content representation is for that specific reader.

Maybe: "Readers MUST read all <cp>..."? But "process" sounds better to me because it implies some kind of transformation.

CLCL> I think this merits additional discussion. I was under the impression that we want to enable amongst others the "roundtripping" of non-Unicode code points. Thus, they would need to be written out.

YS> ...But then, this prevents tools to catch several 
YS> errors in one go...
CL> How about the following?
CL> If the value of the hex attribute is invalid, the Readers 
CL> MUST continue in a "detect additional errors" mode 
CL> (to gather a list of all errors). In the end, the Readers
CL> MUST generate an error, MUST terminate the process, 
CL> and must point to logging information (for the errors).

I don't think we should force a reader to continue after it finds an error.
We certainly should allow it to continue to gather more errors if it feels like it, but not make it mandatory.
Also some readers may have no logging mechanism. We should probably stick to general terms when it comes error handling, like "generate an error".

CLCL> Missing logging mechanisms are a good point ...

Maybe: "If the value of the hex attribute is invalid, the Readers MUST generate an error and MAY terminate the process. This specification does not prescribe how invalid <cp> values are represented in the parsed content."

But I still think it would be better to have an expected behavior: it helps interoperability. U+FFFD seems to be applicable for such case according to http://en.wikipedia.org/wiki/Replacement_character#Replacement_character).

CLCL> I would be tempted to reach out to someone from LDML (or general Unicode) to get guidance.


To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]