OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cgmo-webcgm message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: implications of URI vs. IRI


All --

When I was putting together first unicode tests, Dieter also supplied me 
with this nifty "advanced" test.  It gets into Japanese text for SF text 
like APS ids and names.

It highlights an interesting implication of our decision to stick with URI 
instead of switching to IRI.  URI encoding requires that any non-ASCII 
characters are included by the "URI escaping mechanism", see WebCGM 3.1.1.4 
[1], and the more detailed XML description [2].  Basically, get the 
**UTF8** representation of the characters, and replace each byte in that 
representation by the 3-character string %HH, where HH is the hex 
representation of the byte.

So suppose consider for example the 2-character id of the object in the 
upper-left box, and its use in a link from the object in the upper-right box.

If that id were the two characters c1c2, lets suppose that it could be 
represented by the 4 utf8 bytes b1b2b3b4 (I'm just guessing about "4", 
since UTF8 is variable length, it could be more).  Then to put that id into 
a URI string, it would have to be the 12-character string:

%hh%hh%hh%hh

where the hh are the are the 4 pairs of hex digits that represent the 4 
utf16 bytes. I.e., the CGM URI for the link would be:

#id(%hh%hh%hh%hh, view_context)

Side question.  Does URI (rfc3986 [3]) restrict only the character 
repertoire of the URI, or does it restrict also the encoding?  I.e., can a 
URI be encoded in ascii, isoLatin1, or utf8, or utf16, or whatever, as long 
as it restricts its repertoire to the URI repertoire?  I suspect "yes", but 
I don't know the answer.  It would be interesting for someone to research it.

Thoughts?

Regards,
-Lofton.

[0] http://docs.oasis-open.org/webcgm/v2.0/WebCGM20-IC.html#webcgm_3_1_1_4
[1] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent
[3] URI:  http://www.ietf.org/rfc/rfc3986.txt
[4] IRI:  http://www.ietf.org/rfc/rfc3987.txt

japan-SF.png



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]