[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: encoding of items on path urls
On the way back from the f2f, I did some more testing of what chars can't be used inside b in http://x/a=b/c=d style urls. Both runs of /s and \s are collapsed to a single "/". This is expected to be a common problem and occurs even if the /s and \s are url encoded. Even when url encoded is used (? as %3f etc), the following quoted chars caused problems for .NET 2003: " %&'*:><" [not the quotes but including space] There may be work-arounds but we should work out of the box for ASP.NET etc. The following are OK (some need to be URL encoded): 0123456789abcdefABCDEF_-=?#,!£^(){}[]+`¬\"+;|.,~$@ Given that: 1) the above seems arbitrary and who knows what other problems we may find for other Web servers and portal URL construction schemes. 2) the consumer's Web Server does one URL decode on the way in, so the producer should not decode (again). This is likely to cause much confusion. 3) real world implementations will add a digital signature to navigationalState and interactionState and encrypt, so that consumers (and Web browsers) can't read or modify a Portlet's state. In effect, the URI scheme used by the consumer (and the encoding scheme) is unknown (it could be a yet-to-be-invented URI type not using % url style encoding), so the conservative advice from http://www.w3.org/Addressing/rfc1738.txt should be to only make use of non-reserved chars known to be always safe, independent of the encoding scheme: <quote>Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL. </quote> i.e. alphanumerics and the special characters "$-_.+!*'()," ":" is already used by us for "wsrp:normal" etc so we may have to accept it also. Then my advice is that we should limit simple url tokens to alphaNumeric (0..9,a..Z,A..Z) with ":_-" as separators and require that complex values, such as nav state be binHex encoded. By binHex I mean the XML Schema http://www.w3.org/TR/xmlschema-2/#hexBinary xs:hexBinary type and we could change our wsdl to reflect this. However, this introduces a *2 in length, so producers may like to limit url state to alphaNumeric. This means the public wsdl should use xs:string but a private (producer) wsdl could change these to xs:hexBinary so that decoding is automatic. In summary, WSRP can not dictate the URI scheme (or encoding scheme) used by a consumer so only <quote>alphanumerics, the special characters "$-_.+!*'(),", </quote> are guaranteed to be safe wrt RFC1738.txt. XML Schema's xs:hexBinary is recommended for non-alphanumeric wsrp url token values. ":" is used for wsrp constants. What do people on this SC think, before taking such a proposal to the larger group? regards, Andre
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]