xri-comment message

Subject: Comments on HXRI encoding rules

From: Aragon Gouveia <aragon@phat.za.net>
To: xri-comment@lists.oasis-open.org
Date: Mon, 27 Apr 2009 19:13:41 +0200

Hello,

I have been familiarising myself with the xri-resolution-2.0 committee
draft 03 and have some questions/comments regarding the HXRI encoding
rules specified in 11.4.

I think the specification might be making some dangerous assumptions on
how web servers do/might process reserved characters in URIs, and I
think the encoding rules can be simplified.

To elaborate I will give the fully encoded example from the specification:

https://xri.example.com/=example*r%25E9sum%25E9/path?query
&_xrd_r=application/xrds+xml%3Bhttps=true%3Bsep=true
&_xrd_t=http://example.org/test?a=1%26b=hello%2520plan%25E8te
&_xrd_m=application/atom+xml

My thoughts in point form:

1. RFC3986 specifies that "=" and "*" are reserved characters. Although
they have a defined meaning in the path component of the XRI URI scheme,
their meaning in the HTTP URI scheme (AFAIK) is undefined, but reserved.
It is possible that there is (or may be in future) web server software
that associates meaning to these URI characters in the path component of
its local HTTP URI scheme, so I think the XRI specification
should warn XRI proxy software authors that they must ensure
their web server does not process any RFC3986 reserved characters
that occur in the path component below the base URI of the proxy
service. The same should be said for "@", "!", and "$".

2. I can not find the specification of an XRI's query component. In HTTP
world (or rather, HTML world) the norm is for it to use the W3C's
application/x-www-form-urlencoded form, but this is not referenced
in any of the XRI specifications as far as I can see. A potential
source of conflict might arise from 11.4 contradicting
application/x-www-form-urlencoded form in specifying that "+" characters
must not be used to encode a SPACE. In HTTP world many web servers
assume the query component follows the application/x-www-form-urlencoded
form and so will decode "+" in to a space. I think XRI's query
component form should be more clearly defined, and if it is decided that
it won't follow application/x-www-form-urlencoded, I think 11.4 should
clearly warn proxy authors that they must write their own query decoder
instead of relying on an HTTP native decoder (which expects
application/x-www-form-urlencoded form). ie. it should clearly state
that SPACEs must not be encoded to "+" AND that "+" must not be decoded
to SPACEs. Right now if the example URI is processed by a web server
the "+" characters will very likely be converted to SPACEs, resulting in
the contents of the above _xrd_r and _xrd_m parameters becoming corrupt.

3. In the query component, the specification suggests that ";" and "&"
should be percent-encoded if they form part of data, but I think it
would be good to include "=" characters too.

4. Is it really necessary to perform double percent-encoding to the UCS
and SPACE characters in the example above? The reason I question this
is that by performing double encoding one is converting those XRI
characters into self-contained data within the HTTP URI as opposed to
simply transforming them into HTTP URI components. If it is the
specification's intention to do the former, then I would argue that, for
example, this:

xri://=example*r%E9sum%E9/path?param1=val1&param2=val2

Should actually be encoded:

https://xri.example.com/%3Dexample%2Ar%25E9sum%25E9%2Fpath%3Fparam1%3Dval1%26param2%3Dval2
?_xrd_r...

I hope I have not misinterpreted anything. Can anyone comment on these
points please?

Thanks,
Aragon