OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xacml message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Case conversions


All,

I have looked into case conversion of strings.

We have previously desided to introduce a new function in 3.0:

urn:oasis:names:tc:xacml:3.0:function:string-equal-ignore-case

There is also an existing function in 1.0 called

urn:oasis:names:tc:xacml:1.0:function:string-normalize-to-lower-case

which could use clarification in its description.

For string-equal-ignore-case unicode defines a default case folding 
operation which is locale independent. We should use that.

For string-normalize-to-lower-case, unicode also defines default case 
conversion tables. We should use those.

The exact behavior of these would depend on the particular version of 
unicode in use. XQuery defines "lower-case" in the following way:

--8<--
Returns the value of $arg after translating every character to its 
lower-case correspondent as defined in the appropriate case mappings 
section in the Unicode standard [The Unicode Standard]. For versions of 
Unicode beginning with the 2.1.8 update, only locale-insensitive case 
mappings should be applied. Beginning with version 3.2.0 (and likely 
future versions) of Unicode, precise mappings are described in default 
case operations, which are full case mappings in the absence of 
tailoring for particular languages and environments. Every upper-case 
character that does not have a lower-case correspondent, as well as 
every lower-case character, is included in the returned value in its 
original form.
--8<--

The reference [The Unicode Standard] says that "The version of Unicode 
to be used is implementation-defined, but implementations are 
recommended to use the latest Unicode version; currently, Version 
4.0.00, Addison-Wesley, 2003 ISBN 0-321-18578-1"

The XML 1.0 specification says:

--8<--
Definition: A character is an atomic unit of text as specified by 
ISO/IEC 10646:2000 [ISO/IEC 10646]. Legal characters are tab, carriage 
return, line feed, and the legal characters of Unicode and ISO/IEC 
10646. The versions of these standards cited in A.1 Normative References 
were current at the time this document was prepared. New characters may 
be added to these standards by amendments or new editions. Consequently, 
XML processors MUST accept any character in the range specified for Char.
--8<--

Both these statements suggests the practice of using the latest version 
of unicode. The following page

http://unicode.org/versions/

suggests that versions of unicode are backwards compatible in general.

So, I propose that we do this:

1. State in the XACML specification that the version of unicode is 
implementation defined, but it is recommended that the latest version is 
used.

2. Define the string functions in a similar manner as XQuery, that is, 
make references to the default case tables without any locale specific 
conversions.

3. Add to the security considerations section some explanation that the 
version of unicode affects some string functions and that care should be 
taken to avoid characters which could cause problems.

Regards,
Erik



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]