[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [xliff-inline] Req 1.15 Representation of invalid XML characters
Hi Christian, Comments below. CL> How about the following? CL> Unfortunately, XML does not have the capability to contain CL> all Unicode code points. Due to this, in certain instances CL> extra syntax is required to represent those code points that CL> cannot be otherwise represented in element content. These CL> escapes are only allowed in certain elements, according to CL> the DTD. (from http://unicode.org/reports/tr35/#Escaping_Characters). CL> Writers MUST represent these code points of the inline content CL> using the LDML representation (e.g. <cp hex="0">). I disagree: In XLIFF <cp> is an XLIFF representation, not an LDML one. We can certainly point to the source of inspiration, but we also want to take ownership of the element in the XLIFF context. YS> - Readers MUST process all <cp> elements regardless YS> whether their hex value is a valid or invalid XML YS> code points. > CL> How can we define "process"? > YS> Maybe interpret, or convert would be better (more specific). > CL> How about the following? CL> Readers must preserve the content of "cp" elements. There is no "content" in cp as it's an empty element :) And I think "preserve" may be confusing as it may be seen related to writing things out after processing. Here we are just saying that all cp element must be processed. That is: even if the value of "hex" may not corresponds to an invalid character it should be read and converted into whatever the parsed content representation is for that specific reader. Maybe: "Readers MUST read all <cp>..."? But "process" sounds better to me because it implies some kind of transformation. YS> ...But then, this prevents tools to catch several YS> errors in one go... > CL> How about the following? CL> If the value of the hex attribute is invalid, the Readers CL> MUST continue in a "detect additional errors" mode CL> (to gather a list of all errors). In the end, the Readers CL> MUST generate an error, MUST terminate the process, CL> and must point to logging information (for the errors). I don't think we should force a reader to continue after it finds an error. We certainly should allow it to continue to gather more errors if it feels like it, but not make it mandatory. Also some readers may have no logging mechanism. We should probably stick to general terms when it comes error handling, like "generate an error". Maybe: "If the value of the hex attribute is invalid, the Readers MUST generate an error and MAY terminate the process. This specification does not prescribe how invalid <cp> values are represented in the parsed content." But I still think it would be better to have an expected behavior: it helps interoperability. U+FFFD seems to be applicable for such case according to http://en.wikipedia.org/wiki/Replacement_character#Replacement_character). Cheers, -yves
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]