[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [wsrp-webservice] encoding of items on path urls
That's what I'm asking about! I presume the consumer can still assume UTF-8 URL encoding when it decodes during a rewrite. I agree that passing a query strings through a path is getting us into trouble but we have to do something like that to make forms with method="get" work.I was assuming that all urls would be in UTF-8 so that we can assume that char sets for all wsrp token values is going to be UTF-8 in effect. Then we can just translate to binHex.For consumer writing, we can URL encode or convert to binHex as the consumer should protect itself if it needs more than url encoding. But don't forget the consumer side decode...
What are these issues?And double encoding does have issues with .NET and IMHO would lead to much confusion.
Don't know what you mean. Doesn't the fact that URL encoding results in URL safe characters means that it should be possible to put these characters anywhere in a URL?In the end, the real question for me is if we can dictate that all uri schemes will be "URL encoding friendly" in all places on the uri that wsrp token values could be written into. I now think the answer is "no":e.g. http://{wsrp-mode}.someFancyDNS/ with custom mode "pay%80" for {wsrp-mode}Inventing a uri scheme would lead to a longer but less far fetched example.
regards,Andre-----Original Message-----Hi Andre
From: David Ward [mailto:david.ward@oracle.com]
Sent: 16 May 2003 13:32
To: Andre Kramer
Cc: wsrp-webservice@lists.oasis-open.org
Subject: Re: [wsrp-webservice] encoding of items on path urls
I have a few comments inline
Regards
David
Andre Kramer wrote:
Given all this, it sounds like trying to pass a query string through a path is a bad idea!On the way back from the f2f, I did some more testing of what chars can't be used inside b in http://x/a=b/c=d style urls. Both runs of /s and \s are collapsed to a single "/". This is expected to be a common problem and occurs even if the /s and \s are url encoded. Even when url encoded is used (? as %3f etc), the following quoted chars caused problems for .NET 2003: " %&'*:><" [not the quotes but including space] There may be work-arounds but we should work out of the box for ASP.NET etc. The following are OK (some need to be URL encoded): 0123456789abcdefABCDEF_-=?#,!£^(){}[]+`¬\"+;|.,~$@ Given that: 1) the above seems arbitrary and who knows what other problems we may find for other Web servers and portal URL construction schemes.
I disagree - that's how query strings have always worked in servlets!2) the consumer's Web Server does one URL decode on the way in, so the producer should not decode (again). This is likely to cause much confusion.
Yes - but when you URL encode, that's exactly what you are doing! You are encoding everything using non-reserved characters. It's just that .NET is misbehaving and modifying the decoded characters!3) real world implementations will add a digital signature to navigationalState and interactionState and encrypt, so that consumers (and Web browsers) can't read or modify a Portlet's state. In effect, the URI scheme used by the consumer (and the encoding scheme) is unknown (it could be a yet-to-be-invented URI type not using % url style encoding), so the conservative advice from http://www.w3.org/Addressing/rfc1738.txt should be to only make use of non-reserved chars known to be always safe, independent of the encoding scheme:
Using binHex on its own isn't safe for multibyte character sets - binHex can reliably encode bytes, but you would have to also use a well-defined transformation from characters into bytes, e.g. UTF-8 encoding.<quote>Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL. </quote> i.e. alphanumerics and the special characters "$-_.+!*'()," ":" is already used by us for "wsrp:normal" etc so we may have to accept it also. Then my advice is that we should limit simple url tokens to alphaNumeric (0..9,a..Z,A..Z) with ":_-" as separators and require that complex values, such as nav state be binHex encoded. By binHex I mean the XML Schema http://www.w3.org/TR/xmlschema-2/#hexBinary xs:hexBinary type and we could change our wsdl to reflect this.I trust you're only talking about producer-side URL-rewriting here and that tokenized URLs for consumer-side rewriting will still use regular UTF-8 URL encoding.However, this introduces a *2 in length, so producers may like to limit url state to alphaNumeric. This means the public wsdl should use xs:string but a private (producer) wsdl could change these to xs:hexBinary so that decoding is automatic. In summary, WSRP can not dictate the URI scheme (or encoding scheme) used by a consumer so only <quote>alphanumerics, the special characters "$-_.+!*'(),", </quote> are guaranteed to be safe wrt RFC1738.txt. XML Schema's xs:hexBinary is recommended for non-alphanumeric wsrp url token values. ":" is used for wsrp constants. What do people on this SC think, before taking such a proposal to the larger group?
Another alternative is to just 'double URL encode' the 'complex' values - that way every byte that is already URL safe can be passed straight through as a single byte, and every non-URL safe byte will be encoded as %37xx, where xx is the hex encoding of the byte value (and %37 is a URL encoded '%'!). This may sound daft, but you might find that in a lot of implementations, navigational state and interaction state are themselves in query string form, meaning that when they are passed as a single value in a query string, URL unsafe characters will in effect be 'double encoded'. If you binHex everything, then the size of the URL is really going to balloon, and you may quickly reach the operational limits for URLs in browsers, caches, etc.
regards, Andre
--
David Ward
Principal Software Engineer
Oracle Portal
Oracle European Development Centre
520 Oracle Parkway
Thames Valley Park
Reading
Berkshire RG6 1RA
UK
Email: david.ward@oracle.com Tel: +44 118 924 5079 Fax: +44 118 924 5005
David Ward Principal Software Engineer Oracle Portal |
Oracle European Development Centre 520 Oracle Parkway Thames Valley Park Reading Berkshire RG6 1RA UK |
|
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]