[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [office] IRI vs URI Discussion Today (2010-09-13)
Good news: 1. I am happy to report that the IRI to URI mapping in [RFC3987] only converts a set of allowed Unicode Characters that are not part of the Basic Latin set. So the appearance of Basic Latin characters and C0+C1 controls has to already be valid for appearance in a URI (or be already %-encoded in a place where %-encodings may appear). This makes some business with the mapping easier than I thought. 2. The IRI specification [RFC3987] makes the valuable statement that "When an IRI is used for resource retrieval, the resource that the IRI locates is the same as the one located by the URI obtained after converting the IRI according to the procedure defined here. This means there is no need to define resolution separately on the IRI level." On the other hand, they don't recommend arbitrarily mapping back and forth, keeping any mapping or attempted inversions to the minimum necessary. - Dennis PS: using %62 instead of the letter "b" is definitely not recommended. It should certainly not be done by software. But if it is in an IRI that comes into our possession, it is wise not to change it. The security issues that go with this sort of thing (as a way of obscuring something about a web site or resource) might be handled by how it is presented, but not by automatically adjusting it. -----Original Message----- From: Dennis E. Hamilton [mailto:dennis.hamilton@acm.org] Sent: Monday, September 13, 2010 10:36 To: 'Andreas J. Guelzow' Cc: 'ODF TC List' Subject: RE: [office] IRI vs URI Discussion Today (2010-09-13) Yes, I intentionally used the %62 escape. However, in URIs there is no assurance that the 0x62 byte is intended to be the ASCII/ISO 646 encoding for the letter "b". That's why, among other reasons, the rule for URIs in namespace declarations (and in some other cases) says that the namespaces identified by URIs http://example.com/abc and http://example.com/a%62c are different. That's why it is important to urge that producers SHOULD NOT %-encode the UTF8 encoding of any Basic Latin Characters that are freely-usable in URIs without any escaping and that consumers SHOULD NOT decode any %-encoding within IRIs in the markup of a consumed document. - Dennis FURTHER THOUGHTS [ ... ] Here's an odd case. If for some reason the URI mapping of a non-URI IRI is provided as the value of a markup item whose datatype is anyURI, no %-encoding in it should be decoded in submission to a URI/IRI resolver. In deciding if two IRIs are the same or not, it is probably appropriate to map them both to URIs and see if those are the same. (The mapping should do something rational for those parts of URIs that are not case-sensitive, such as the letters for hexadecimal digits in a %-encoding.) I am tempted to say in regard to the consumption of ODF documents that mapping to URIs MAY always be done before submission to a resolver, whether or not IRIs are directly acceptable to the resolver. Something tells me this is a natural consequence of the way mapping of IRIs to URIs is defined, but I am not 100% certain of that at this point. I can't imagine an interoperable case without this assurance, however. [ ... ]
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]