[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Case conversions
All, I have looked into case conversion of strings. We have previously desided to introduce a new function in 3.0: urn:oasis:names:tc:xacml:3.0:function:string-equal-ignore-case There is also an existing function in 1.0 called urn:oasis:names:tc:xacml:1.0:function:string-normalize-to-lower-case which could use clarification in its description. For string-equal-ignore-case unicode defines a default case folding operation which is locale independent. We should use that. For string-normalize-to-lower-case, unicode also defines default case conversion tables. We should use those. The exact behavior of these would depend on the particular version of unicode in use. XQuery defines "lower-case" in the following way: --8<-- Returns the value of $arg after translating every character to its lower-case correspondent as defined in the appropriate case mappings section in the Unicode standard [The Unicode Standard]. For versions of Unicode beginning with the 2.1.8 update, only locale-insensitive case mappings should be applied. Beginning with version 3.2.0 (and likely future versions) of Unicode, precise mappings are described in default case operations, which are full case mappings in the absence of tailoring for particular languages and environments. Every upper-case character that does not have a lower-case correspondent, as well as every lower-case character, is included in the returned value in its original form. --8<-- The reference [The Unicode Standard] says that "The version of Unicode to be used is implementation-defined, but implementations are recommended to use the latest Unicode version; currently, Version 4.0.00, Addison-Wesley, 2003 ISBN 0-321-18578-1" The XML 1.0 specification says: --8<-- Definition: A character is an atomic unit of text as specified by ISO/IEC 10646:2000 [ISO/IEC 10646]. Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. The versions of these standards cited in A.1 Normative References were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions. Consequently, XML processors MUST accept any character in the range specified for Char. --8<-- Both these statements suggests the practice of using the latest version of unicode. The following page http://unicode.org/versions/ suggests that versions of unicode are backwards compatible in general. So, I propose that we do this: 1. State in the XACML specification that the version of unicode is implementation defined, but it is recommended that the latest version is used. 2. Define the string functions in a similar manner as XQuery, that is, make references to the default case tables without any locale specific conversions. 3. Add to the security considerations section some explanation that the version of unicode affects some string functions and that care should be taken to avoid characters which could cause problems. Regards, Erik
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]