OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xdi message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [xdi] Minutes: XDI TC Telecon Friday 2014-03-21


I don't really understand this table, but I was wondering whether the encoding is actually relevant to sorting.

How does sorting really work with Unicode?
Are we supposed to do byte sorting (in which case the encoding would matter).
Or are we supposed to do character sorting, isn't this something that should be specified somewhere in Unicode, independently of the encoding that is used?

Maybe this is what we're looking for. http://www.unicode.org/reports/tr10/

Markus


On Tue, Mar 25, 2014 at 8:46 AM, Joseph Boyle <planetwork@josephboyle.net> wrote:

On Mar 23, 2014, at 5:11 AM, Markus Sabadello <markus.sabadello@xdi.org> wrote:

Finally, we talked about Unicode and how the differences between UTF-8 and UTF-16 may affect ordering and therefore signatures. Joseph explained that Java internally uses UTF-16, whereas XDI serializations require UTF-8 encoding.

Markus will review the relevant XDI2 code sections to see if this is an issue. In Java, the popular ICU4j library may be needed to produce correct results.


I think it is enough to alter the comparison of the top digit of two code units so that D > E, D > F:

   | 0-C  D  EF
0-C|  =   <   <
D  |  >   =   >
EF |  >   <   =


This procedure gives a result for unpaired surrogates rather than throwing an error, but screening out unpaired surrogates is a separate issue.







[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]