[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [xdi] Minutes: XDI TC Telecon Friday 2014-03-21
UTF-8 (and UTF-32) binary sort order is the same as Unicode codepoint order. However UTF-16 binary order departs from codepoint order; U+E000-U+FFFF would sort above U+10000-U+10FFF. Sorting UTF-16 code units in the order 0000, ... , CFFF, E000, E001, ... , FFFF, D000, D001, ... , DFFF will give Unicode codepoint order for well-formed input. ICU is not needed for this. Unicode Collation Order addresses issues like making accented letter = plain letter + combining accent. It is a heavyweight algorithm requiring ICU and it does not make sense to use it simply for ordering for binary signature generation. Sent from my iPhone
|
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]