OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

uddi-spec message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [uddi-spec] The V3 to V2 key conversion algorithm documented in 10.1.1 of the 3.0.1 standard


Tony,

 

I agree that the description in section 10.1.1 could be clearer.  At the time that the changes to the algorithm were being discussed, my preferred approach was to delete most of section 10.1.1 and instead refer the reader to the Leach document, as the algorithm presented is essentially the algorithm in section 3.3 of the Leach document with a single fixed namespace for all UDDI V3 keys.  We chose instead to make the minimum changes to the text, and most of your comments apply equally to the original version of 10.1.1.

 

Given the restricted character set of UDDI V3 keys, a subset of ASCII, “the bytes” is the ASCII values of the characters of the normalized key.  In my Java implementation of the algorithm I obtain “the bytes” by calling the Java String.toLowerCase method to do the normalization and then I call the String.getBytes method with an argument of “UTF-8” to produce “the bytes”.  Given the restricted character set of the input key I could have used any of several encoding schemes to produce the same set of bytes.  Other implementations in other programming languages would do something similar.

 

The “uddi:” prefix is included in the bytes that are hashed as the prefix is a required part of the key so k3 in the algorithm must begin with “uddi:” and no mention is made of removing the prefix/scheme.

 

I too was confused by the endian forms in step 2 and am still not convinced that the Little-endian form is correct but as Java is big-endian I deferred to someone more experienced with the little-endian form.  The string form of a UUID is defined to be effectively big-endian so the string we end up with is big-endian but I guess the little-endian form is used internally in languages other than Java that are natively little-endian and it makes sense to create the UUID correctly in case common code is used to produced the string version, or the UUID is used for more than just producing the string.  The Leach algorithm is described only in the big-endian form but the final step is to convert the UUID to local byte order, which should produce the same sequence of octets as the little-endian form of step 2 of our algorithm.

 

I don’t think we should change section 4.4 to refer to Unicode 4.  I think that if we are going to switch to Unicode 4 then we need to do it uniformly throughout the specification.  I would think this is more appropriate for UDDI V4.

 

John Colgrave

IBM

 

-----Original Message-----
From: Rogers, Tony [mailto:Tony.Rogers@ca.com]
Sent: 29 August 2003 12:02
To: uddi-spec@lists.oasis-open.org
Subject: [uddi-spec] The V3 to V2 key conversion algorithm documented in 10.1.1 of the 3.0.1 standard

 

As an exercise in testing clarity I thought I'd try implementing the algorithm outlined in 10.1.1. My findings are that it's incompletely specified, and could do with more detail.

The first step specifies the use of the "the bytes of the normalized form" of the key. Sounds simple enough...

Comment:  The "normalized form" of the key is documented in section 4.4 - it might be worth pointing at that section from this algorithm. It might also be a good thing to include in the glossary (it isn't there). BTW: section 4.4 refers to a tech report (http://www.unicode.org/unicode/reports/tr21/) which has been superseded by Unicode 4 - should we update our reference?

Problem: what is meant by "the bytes"? I assume this means the bytes of the Unicode representation (given that we are using Unicode), which means that we must worry about endian issues, given that each Unicode character is two bytes, and MD5 operates on bytes rather than characters. Are we requiring a big-endian or little-endian representation? Or are we feeding UTF-8 into the hash? Does this mean we will have issues with UTF-16? We really should specify what is meant by "the bytes".

Problem: is the "uddi:" prefix on the key included in the bytes to be hashed? Or do we hash just the portion after that prefix? There's no statement either way, and the fact that the "uuid:" prefix must be added afterwards for tModel keys adds to the confusion.

Problem: I was very confused by the discussion of endian forms in the second step - I assumed that they were only relevant in considering the data to be converted in the third step, because the MD5 hash outputs bytes (by my reading of it, anyway). If that's the case, then we might be well advised to drop the reference to the document and state explicitly which byte goes where in the final result - it would make things simpler for implementors of this algorithm. If we were to say, for example, that the first two characters of the output are the hex representation of byte[3] of the MD5 hash (that's my reading of it), then there's no confusion. Going from the bytes of the MD5 hash across to the pseudo-words of the UUID format, then back to the bytes that correspond to the hex string, seems unnecessary. (Should a coder care to implement it using words, then they can look out for endian issues for themselves)

 

I am not getting the right values out of my implementation yet, so I'm certain I haven't all the answers - given that I'm far from a novice coder, I think this clearly indicates that we have work to do on this section.

 

Tony (Troublemaker) Rogers



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]