[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Unicode issues again
All, I am making the changes in the working draft to fix the unicode issues. The functions which need to be updated are: string-equal string-greater-than string-greater-than-or-equal string-less-than string-less-than-or-equal All these are described in terms of "byte-for-byte" comparison. However such comparison is ambiguous since unicode strings are streams of 32 bit characters, and there are many different ways to encode these streams into byte streams, such as UTF-8, UTF-16, etc. Different encodings lead to different byte-for-byte comparisons. We have decided to instead use unicode code point collation, which does not depend on the encoding and is unambiguous. I just wanted to highlight the changes to the TC. I will add this text to the function definitions: "The comparison SHALL use Unicode codepoint collation, as defined for the identifier http://www.w3.org/2005/xpath-functions/collation/codepoint by [XF]." Also, we didn't discuss the issue whether the function XACML identifiers should be changed or not. I propose that we do not change them since the behavior in 2.0 was not clearly defined anyway. In particular (I think that) a hypothetical UTF-32 encoding before the comparison in XACML 2.0 will lead the result which we now define for 3.0. And the wording in 2.0 suggests that this is what was intended in the first place. Another unicode issue which has not been discussed yet concerns case conversion. We have case conversion functions for strings in XACML, and I think case conversion also depends on the locale. I haven't investigated this yet though. Regards, Erik
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]