OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-collab message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [office-collab] Position XML


Am 24.04.2014 16:11, schrieb robert_weir@us.ibm.com:
...

I hope in the actual specification text we can be precise about character counting.  As we all know, with XML we're dealing with lexical strings, which might include character entities, as well as parsed XML where there is Unicode characters, but even then there are different conventions of dealing with composition sequences, etc.  We probably want to cite a specific Unicode normalization form to do the counting on:

http://www.w3.org/TR/2005/WD-charmod-norm-20051027/

It looks like "Form C" is what the W3C is recommending for processing, but I am not certain:

http://www.unicode.org/reports/tr15/tr15-25.html#Specification


Note:  This came up in the OpenFormula discussions, since we have spreadsheet functions that deal with extracting substrings at given offsets.  In that case, implementations diverged enough that we were only able to mark some functions as "normalization-sensitive", a form of implementation-dependent behavior.  I really hope that with CT, since we're starting fresh, we can specify exactly what normalization form to use.

Good point and thanks for the references. I will have a look into them.

Best regards,
Svante


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]