OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office-formula message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [office-formula] Summary 2010-09-07 of OpenFormula meeting

I don't understand the statement about surrogate pairs.

My understanding is that the surrogate code values are not valid Unicode
code points.

Surrogate pairs are, in my understanding, only meaningful in UTF16 encoding.
The surrogate codes are blocked out from Unicode code points as a
convenience and for historical reasons.  Use of surrogate code values as
code points is ill-formed.

(You should never see a surrogate value in a UTF8-encoded Unicode text, for
example, and not in UTF32 either.)

My reading of XPATH section 3.6 is that they are emphatically not assuming
UTF16 and there is a warning that when handling UTF16 encodings,
implementers should be careful to treat surrogate pairs as the single
Unicode code point that is represented by the pair.  (The XPATH
specification also refers to Unicode code points as Unicode abstract
character scalar values.)  

Although XPATH warns that if two strings do not happen to be normalized the
same way, unexpected results may occur when those strings are compared, I
don't believe that XPATH requires or provides any normalizing.  (Nor should
we, IMHO.)

 - Dennis

-----Original Message-----
From: David A. Wheeler [mailto:dwheeler@dwheeler.com] 
Sent: Tuesday, September 07, 2010 09:01
To: office-formula@lists.oasis-open.org
Subject: [office-formula] Summary 2010-09-07 of OpenFormula meeting

Summary 2010-09-07 of OpenFormula meeting

[ ... ]

* OFFICE-2663

Wheeler: MacOS imposes a different normalization than everyone else.

Eike: CODE is a bad example, it depends on the code page.

Weir: For Unicode, can we just say implementation-defined, but must be
first Unicode character or "logical" value?  What's that?

Wheeler: Could we just say Unicode codepoints?

Patrick: No.  If you look at XPATH language, it says surrogate pairs should
be treated specially.

Wheeler: XPATH doesn't require normalization, it just warns
about "unexpected results"; can we do the same?

[ ... ] 

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]