OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xliff-inline message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [xliff-inline] Req 1.15 Representation of invalid XML characters


Hi,

Please find some comments below (search for CL>> ).

Cheers,
Christian

-----Original Message-----
From: Yves Savourel [mailto:ysavourel@enlaso.com] 
Sent: Sonntag, 14. August 2011 09:07
To: xliff-inline@lists.oasis-open.org
Subject: RE: [xliff-inline] Req 1.15 Representation of invalid XML characters

Hi Christian,

>> - Writers MUST encode all invalid XML code points 
>> of the inline content using <cp>.
>
CL> We may need to include an explanation of "invalid/valid
> XML code point". We should also note that the "cp" 
> idea is from Unicode (LDML).

Yes. For code point, We should probably talk about "character" rather than "code point" here. The character's code point being just the value of 'hex'.

CL>> How about the following?
CL>> Unfortunately, XML does not have the capability to contain all Unicode code points. Due to this, in certain instances extra syntax is required to represent those code points that cannot be otherwise represented in element content. These escapes are only allowed in certain elements, according to the DTD. (from  http://unicode.org/reports/tr35/#Escaping_Characters). Writers MUST represent these code points of the inline content using the LDML representation (e.g. <cp hex="0">).

> - Readers MUST process all <cp> elements regardless 
> whether their hex value is a valid or invalid XML 
> code points.
>
CL> How can we define "process"?

Maybe interpret, or convert would be better (more specific).

CL>> How about the following?
CL>> Readers must preserve the content of "cp" elements.

> ... If the process is not terminated, the code point 
> with the error MUST be replaced with a question 
> mark character (U+003F). [[or should we use U+FFFD?]]
>
CL> I am not sure about both options. I would rather tend 
> towards a characters (or even string) which makes its 
> origin (namely a replacement stemming from a 
> process related to invalid hex code) clear.

U+FFFD would be the closest character for that. But maybe a string expression could be better I suppose. Something like "[!invalid-cp-hex:'hex:badvalue'!]"?

CL>> I searched for hints how applications deal with the LDML "cp". However, I did not find anything.

This opens the question about error handling in general in the processing expectation. An error is a problem that should not be dismissed, and allowing "fall-back" like this may lead to bad practices. The bottom line is the file should be fixed. Maybe the expectation should be:

- If the value of the hex attribute is invalid, the Readers MUST generate an error and MUST terminate the process.

But then, this prevents tools to catch several errors in one go...

CL>> How about the following?
CL>> If the value of the hex attribute is invalid, the Readers MUST continue in a "detect additional errors" mode (to gather a list of all errors). In the end, the Readers MUST generate an error, MUST terminate the process, and must point to logging information (for the errors).

-ys




---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]