wsrp-webservice message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [wsrp-webservice] encoding of items on path urls
- From: Rich Thompson <richt2@us.ibm.com>
- To: wsrp-webservice@lists.oasis-open.org
- Date: Fri, 16 May 2003 10:03:53 -0400
Is another way of looking at this to
say that there are Consumer platforms that are unable to support certain
URL encoded characters in the path portion of the URL. When such a Consumer
uses a portlet that has specified usesMethodGet=true, the Consumer will
be unable to operate in a stateless manner because it will be unable to
push all portlet URL parameter values into the path portion of the URL?
Rich Thompson
| Andre Kramer <andre.kramer@eu.citrix.com>
05/16/2003 09:16 AM
|
To:
"'David Ward'" <david.ward@oracle.com>
cc:
wsrp-webservice@lists.oasis-open.org
Subject:
RE: [wsrp-webservice] encoding of items
on path urls |
see below. From my point of view,
double encoding does not solve the problem, so would be worse (more confusion)
than the current url encode advice.
-----Original Message-----
From: David Ward [mailto:david.ward@oracle.com]
Sent: 16 May 2003 14:06
To: Andre Kramer
Cc: wsrp-webservice@lists.oasis-open.org
Subject: Re: [wsrp-webservice] encoding of items on path urls
Andre Kramer wrote:
I agree that passing a query strings
through a path is getting us into trouble but we have to do something like
that to make forms with method="get" work.
I was assuming that all urls would
be in UTF-8 so that we can assume that char sets for all wsrp token values
is going to be UTF-8 in effect. Then we can just translate to binHex.
For consumer writing, we can URL
encode or convert to binHex as the consumer should protect itself if it
needs more than url encoding. But don't forget the consumer side decode...
That's what I'm asking about! I presume the consumer can
still assume UTF-8 URL encoding when it decodes during a rewrite.
[Andre Kramer] We certainly can do this but the consumer may have
to undo an unneeded URL encode.
And double encoding does have
issues with .NET and IMHO would lead to much confusion.
What are these issues?
[Andre Kramer] The main issue is that %s are not working in paths. They
break .NET 2003 even with double encoding (which does not remove the %
char).
In the end, the real question
for me is if we can dictate that all uri schemes will be "URL encoding
friendly" in all places on the uri that wsrp token values could be
written into. I now think the answer is "no":
e.g. http://{wsrp-mode}.someFancyDNS/
with custom mode "pay%80" for {wsrp-mode}
Inventing a uri scheme would lead
to a longer but less far fetched example.
Don't know what you mean. Doesn't the fact that URL encoding
results in URL safe characters means that it should be possible to put
these characters anywhere in a URL?
[Andre Kramer] Then so does binHex encoding.
regards,
Andre
-----Original Message-----
From: David Ward [mailto:david.ward@oracle.com]
Sent: 16 May 2003 13:32
To: Andre Kramer
Cc: wsrp-webservice@lists.oasis-open.org
Subject: Re: [wsrp-webservice] encoding of items on path urls
Hi Andre
I have a few comments inline
Regards
David
Andre Kramer wrote:
On the way back from the f2f, I did some more testing
of what chars can't be
used inside b in http://x/a=b/c=d
style urls.
Both runs of /s and \s are collapsed to a single "/". This
is expected to
be a common problem and occurs even if the /s and \s are url encoded.
Even when url encoded is used (? as %3f etc), the following quoted chars
caused problems for .NET 2003: " %&'*:><" [not
the quotes but including
space] There may be work-arounds but we should work out of the box for
ASP.NET etc.
The following are OK (some need to be URL encoded):
0123456789abcdefABCDEF_-=?#,!£^(){}[]+`¬\"+;|.,~$@
Given that:
1) the above seems arbitrary and who knows what other problems we may find
for other Web servers and portal URL construction schemes.
Given all this, it sounds like trying to pass a query
string through a path is a bad idea!
2) the consumer's Web Server does one URL decode on
the way in, so the
producer should not decode (again). This is likely to cause much confusion.
I disagree - that's how query strings have always worked
in servlets!
3) real world implementations will add a digital signature
to
navigationalState and interactionState and encrypt, so that consumers (and
Web browsers) can't read or modify a Portlet's state.
In effect, the URI scheme used by the consumer (and the encoding scheme)
is
unknown (it could be a yet-to-be-invented URI type not using % url style
encoding), so the conservative advice from
http://www.w3.org/Addressing/rfc1738.txt
should be to only make use of
non-reserved chars known to be always safe, independent of the encoding
scheme:
Yes - but when you URL encode, that's exactly what you
are doing! You are encoding everything using non-reserved characters. It's
just that .NET is misbehaving and modifying the decoded characters!
<quote>Thus, only alphanumerics, the special
characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used unencoded
within a URL. </quote>
i.e. alphanumerics and the special characters "$-_.+!*'(),"
":" is already used by us for "wsrp:normal" etc so
we may have to accept it
also.
Then my advice is that we should limit simple url tokens to alphaNumeric
(0..9,a..Z,A..Z) with ":_-" as separators and require that complex
values,
such as nav state be binHex encoded.
By binHex I mean the XML Schema http://www.w3.org/TR/xmlschema-2/#hexBinary
xs:hexBinary type and we could change our wsdl to reflect this.
Using binHex on its own isn't safe for multibyte character
sets - binHex can reliably encode bytes, but you would have to also use
a well-defined transformation from characters into bytes, e.g. UTF-8 encoding.
However, this introduces a *2 in length, so producers
may like to limit url
state to alphaNumeric. This means the public wsdl should use xs:string
but a
private (producer) wsdl could change these to xs:hexBinary so that decoding
is automatic.
In summary, WSRP can not dictate the URI scheme (or encoding scheme) used
by
a consumer so only <quote>alphanumerics, the special characters
"$-_.+!*'(),", </quote> are guaranteed to be safe
wrt RFC1738.txt. XML
Schema's xs:hexBinary is recommended for non-alphanumeric wsrp url token
values. ":" is used for wsrp constants.
What do people on this SC think, before taking such a proposal to the larger
group?
I trust you're only talking about producer-side URL-rewriting
here and that tokenized URLs for consumer-side rewriting will still use
regular UTF-8 URL encoding.
Another alternative is to just 'double URL encode' the 'complex' values
- that way every byte that is already URL safe can be passed straight through
as a single byte, and every non-URL safe byte will be encoded as %37xx,
where xx is the hex encoding of the byte value (and %37 is a URL encoded
'%'!). This may sound daft, but you might find that in a lot of implementations,
navigational state and interaction state are themselves in query string
form, meaning that when they are passed as a single value in a query string,
URL unsafe characters will in effect be 'double encoded'. If you binHex
everything, then the size of the URL is really going to balloon, and you
may quickly reach the operational limits for URLs in browsers, caches,
etc.
regards,
Andre
--
David Ward
Principal Software Engineer
Oracle Portal
| Oracle European Development Centre
520 Oracle Parkway
Thames Valley Park
Reading
Berkshire RG6 1RA
UK
|
|
--
David Ward
Principal Software Engineer
Oracle Portal
| Oracle European Development Centre
520 Oracle Parkway
Thames Valley Park
Reading
Berkshire RG6 1RA
UK
|
|
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]