office-collab message

Subject: Re: [office-collab] Position XML

From: Svante Schubert <svante.schubert@gmail.com>
To: office-collab@lists.oasis-open.org
Date: Thu, 24 Apr 2014 16:22:59 +0200

Am 24.04.2014 16:11, schrieb robert_weir@us.ibm.com:

...

I hope in the actual specification text we can be precise about character counting. As we all know, with XML we're dealing with lexical strings, which might include character entities, as well as parsed XML where there is Unicode characters, but even then there are different conventions of dealing with composition sequences, etc. We probably want to cite a specific Unicode normalization form to do the counting on:

http://www.w3.org/TR/2005/WD-charmod-norm-20051027/

It looks like "Form C" is what the W3C is recommending for processing, but I am not certain:

http://www.unicode.org/reports/tr15/tr15-25.html#Specification

Note: This came up in the OpenFormula discussions, since we have spreadsheet functions that deal with extracting substrings at given offsets. In that case, implementations diverged enough that we were only able to mark some functions as "normalization-sensitive", a form of implementation-dependent behavior. I really hope that with CT, since we're starting fresh, we can specify exactly what normalization form to use.

Good point and thanks for the references. I will have a look into them.

Best regards,
Svante

References:
- Position XML
  - From: Peter Rakyta <rakyta.peter@multiracio.hu>
- Re: [office-collab] Position XML
  - From: Svante Schubert <svante.schubert@gmail.com>
- Re: [office-collab] Position XML
  - From: robert_weir@us.ibm.com