[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [office] IRI vs URI Discussion Today (2010-09-13)
Yes, I intentionally used the %62 escape. However, in URIs there is no assurance that the 0x62 byte is intended to be the ASCII/ISO 646 encoding for the letter "b". That's why, among other reasons, the rule for URIs in namespace declarations (and in some other cases) says that the namespaces identified by URIs http://example.com/abc and http://example.com/a%62c are different. That's why it is important to urge that producers SHOULD NOT %-encode the UTF8 encoding of any Basic Latin Characters that are freely-usable in URIs without any escaping and that consumers SHOULD NOT decode any %-encoding within IRIs in the markup of a consumed document. - Dennis FURTHER THOUGHTS One cannot prevent the use of Basic Latin Characters that are not freely-usable in URIs because they may be required to express a URI of some origin. I would suggest that if such Basic Latin characters must be used, they always be represented by %-encoding of their one-byte UTF8 codes in all IRIs that employ them. I'm not sure if it is appropriate to say that much in the ODF specification. (On the other hand, we do have a say in determining which %-encodings are ever needed in making IRI references to same-document and same-package resources in ODF documents.) Here's an odd case. If for some reason the URI mapping of a non-URI IRI is provided as the value of a markup item whose datatype is anyURI, no %-encoding in it should be decoded in submission to a URI/IRI resolver. In deciding if two IRIs are the same or not, it is probably appropriate to map them both to URIs and see if those are the same. (The mapping should do something rational for those parts of URIs that are not case-sensitive, such as the letters for hexadecimal digits in a %-encoding.) I am tempted to say in regard to the consumption of ODF documents that mapping to URIs MAY always be done before submission to a resolver, whether or not IRIs are directly acceptable to the resolver. Something tells me this is a natural consequence of the way mapping of IRIs to URIs is defined, but I am not 100% certain of that at this point. I can't imagine an interoperable case without this assurance, however. - Dennis -----Original Message----- From: Andreas J. Guelzow [mailto:andreas.guelzow@concordia.ab.ca] Sent: Monday, September 13, 2010 08:49 To: dennis.hamilton@acm.org Cc: ODF TC List Subject: Re: [office] IRI vs URI Discussion Today (2010-09-13) On Mon, 2010-09-13 at 09:00 -0600, Dennis E. Hamilton wrote: > It is conceivable (and permissible) that IRI references > "abc" and "a%62c" resolve to different resources (since they are different > URIs). According to my reading of RFC 2396, this may not be correct. "%62" is the escaped encoding (in the sense of RFC 2396 2.4.1) of the character b. Note specifically in 2.4.2: Because the percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. [That doesn't really mean that everybody does this right. A little test showed me that firefox does not consider them the same in the <first> part of the URI but the Apache server accessed seems to be happy with it in the <second> portion of the URI.) Andreas -- Andreas J. Guelzow Concordia University College of Alberta --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]