xacml message

Subject: Unicode issues
From: Erik Rissanen <erik@axiomatics.com>
To: XACML TC <xacml@lists.oasis-open.org>
Date: Sun, 02 Nov 2008 15:39:54 +0100
All,

We have previously discussed unicode issues for our string functions and 
the W3C working draft here:

http://www.w3.org/TR/2005/WD-charmod-norm-20051027/

I posted some questions for clarification about this to their mailing list.

http://lists.w3.org/Archives/Public/www-international/2008OctDec/0004.html

It turns out that the specification does not meet our needs. After some 
thinking on the issues I have written up the following for the next 
working draft:

A new section:
--8<--
7.1 Unicode issues

In Unicode it is possible to represent some letters by different 
character sequences. The process of converting Unicode strings into 
canonical character sequences is called normalization. An operation is 
normalization-sensitive if its output(s) are different depending on the 
state of normalization of the input(s); if the output(s) are textual, 
they are deemed different only if they would remain different were they 
to be normalized. (Quoted from [CM]).

An XACML implementation MUST NOT perform any normalization-sensitive 
operations unless it has ensured that the inputs are normalized. An 
XACML implementation MUST behave as if each normalization-sensitive 
operation normalizes the string into Unicode normalization form C. An 
implementation MAY use some other form of internal processing as long as 
the externally visible results are identical to this specification.

For more information and specification of normalization forms see [UAX15].
--8<--

The references are:

[CM]     Character model model for the World Wide Web 1.0: 
Normalization, W3C Working Draft, 27 October 2005, 
http://www.w3.org/TR/2005/WD-charmod-norm-20051027/, World Wide Web 
Consortium.

[UAX15]    Davis, Mark, Unicode Standard Annex #15: Unicode 
Normalization Forms, Unicode 5.1, available from 
http://unicode.org/reports/tr15/

In the above mentioned thread on the www-international mailing list I 
wrote that string equal would be defined by binary equality of the 
strings if encoded in a common Unicode encoding form, but I think I will 
stick with what we decided before, that is, "code-point collation" as 
defined in XQuery.

Regarding case mapping I have added the following formulation to the 
existing string-normalize-to-lower-case XACML function. "Case mapping 
shall be done as specified for the fn:lower-case function in [XF] with 
no tailoring for particular languages or environments." [XF] is 
http://www.w3.org/TR/2007/REC-xpath-functions-20070123/

I also noted that the existing normalize-space XACML function had no 
definition of whitespace. I added (like in XQuery): "The whitespace 
characters are defined in the metasymbol S (Production 3) of [XML].". 
[XML] refers to http://www.w3.org/TR/2006/REC-xml-20060816/

I have added a section for unicode security issues.

--8<--
9.3 Unicode security issues

There are many security considerations related to use of Unicode. An 
XACML implementation SHOULD follow the advice given in the relevant 
version of [UTR36].
--8<--

[UTR36] refers to http://unicode.org/reports/tr36/

Best regards,
Erik