xacml message

Subject: Unicode issues again
From: Erik Rissanen <erik@axiomatics.com>
To: XACML TC <xacml@lists.oasis-open.org>
Date: Fri, 26 Sep 2008 14:39:54 +0200
All,

I am making the changes in the working draft to fix the unicode issues.

The functions which need to be updated are:

string-equal
string-greater-than
string-greater-than-or-equal
string-less-than
string-less-than-or-equal

All these are described in terms of "byte-for-byte" comparison. However 
such comparison is ambiguous since unicode strings are streams of 32 bit 
characters, and there are many different ways to encode these streams 
into byte streams, such as UTF-8, UTF-16, etc. Different encodings lead 
to different byte-for-byte comparisons.

We have decided to instead use unicode code point collation, which does 
not depend on the encoding and is unambiguous.

I just wanted to highlight the changes to the TC. I will add this text 
to the function definitions:

"The comparison SHALL use Unicode codepoint collation, as defined for 
the identifier 
http://www.w3.org/2005/xpath-functions/collation/codepoint by [XF]."

Also, we didn't discuss the issue whether the function XACML identifiers 
should be changed or not. I propose that we do not change them since the 
behavior in 2.0 was not clearly defined anyway. In particular (I think 
that) a hypothetical UTF-32 encoding before the comparison in XACML 2.0 
will lead the result which we now define for 3.0. And the wording in 2.0 
suggests that this is what was intended in the first place.

Another unicode issue which has not been discussed yet concerns case 
conversion. We have case conversion functions for strings in XACML, and 
I think case conversion also depends on the locale. I haven't 
investigated this yet though.

Regards,
Erik