[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Comments on HXRI encoding rules
Hello, I have been familiarising myself with the xri-resolution-2.0 committee draft 03 and have some questions/comments regarding the HXRI encoding rules specified in 11.4. I think the specification might be making some dangerous assumptions on how web servers do/might process reserved characters in URIs, and I think the encoding rules can be simplified. To elaborate I will give the fully encoded example from the specification: https://xri.example.com/=example*r%25E9sum%25E9/path?query &_xrd_r=application/xrds+xml%3Bhttps=true%3Bsep=true &_xrd_t=http://example.org/test?a=1%26b=hello%2520plan%25E8te &_xrd_m=application/atom+xml My thoughts in point form: 1. RFC3986 specifies that "=" and "*" are reserved characters. Although they have a defined meaning in the path component of the XRI URI scheme, their meaning in the HTTP URI scheme (AFAIK) is undefined, but reserved. It is possible that there is (or may be in future) web server software that associates meaning to these URI characters in the path component of its local HTTP URI scheme, so I think the XRI specification should warn XRI proxy software authors that they must ensure their web server does not process any RFC3986 reserved characters that occur in the path component below the base URI of the proxy service. The same should be said for "@", "!", and "$". 2. I can not find the specification of an XRI's query component. In HTTP world (or rather, HTML world) the norm is for it to use the W3C's application/x-www-form-urlencoded form, but this is not referenced in any of the XRI specifications as far as I can see. A potential source of conflict might arise from 11.4 contradicting application/x-www-form-urlencoded form in specifying that "+" characters must not be used to encode a SPACE. In HTTP world many web servers assume the query component follows the application/x-www-form-urlencoded form and so will decode "+" in to a space. I think XRI's query component form should be more clearly defined, and if it is decided that it won't follow application/x-www-form-urlencoded, I think 11.4 should clearly warn proxy authors that they must write their own query decoder instead of relying on an HTTP native decoder (which expects application/x-www-form-urlencoded form). ie. it should clearly state that SPACEs must not be encoded to "+" AND that "+" must not be decoded to SPACEs. Right now if the example URI is processed by a web server the "+" characters will very likely be converted to SPACEs, resulting in the contents of the above _xrd_r and _xrd_m parameters becoming corrupt. 3. In the query component, the specification suggests that ";" and "&" should be percent-encoded if they form part of data, but I think it would be good to include "=" characters too. 4. Is it really necessary to perform double percent-encoding to the UCS and SPACE characters in the example above? The reason I question this is that by performing double encoding one is converting those XRI characters into self-contained data within the HTTP URI as opposed to simply transforming them into HTTP URI components. If it is the specification's intention to do the former, then I would argue that, for example, this: xri://=example*r%E9sum%E9/path?param1=val1¶m2=val2 Should actually be encoded: https://xri.example.com/%3Dexample%2Ar%25E9sum%25E9%2Fpath%3Fparam1%3Dval1%26param2%3Dval2 ?_xrd_r... I hope I have not misinterpreted anything. Can anyone comment on these points please? Thanks, Aragon
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]