wsrp-webservice message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [wsrp-webservice] encoding of items on path urls
- From: Andre Kramer <andre.kramer@eu.citrix.com>
- To: 'Rich Thompson' <richt2@us.ibm.com>, wsrp-webservice@lists.oasis-open.org
- Date: Fri, 16 May 2003 16:09:39 +0100
Yes,
that it in a nutshell.
For
certain URI schemes (unfortunately including URLs for .NET) the consumer will
have to either:
1) not
support portlets using method GET
2)
only use consumer re-writing and a special coding
The only real alternative on the table is to avoid url encoding
altogether, by only allowing safe chars (alphanumberic and "$-_.+!*'(),"),
with binHex recommended for general binary data.
We
also have some remaining related URL issues:
a)
templates are markup type dependent (and possibly chars set also) and we have no
way to direct the producer to use one template type for a markup
type.
b)
consumer re-writes are not valid XML (as David raised)
c)
templates can't be easily used from XSLT
d)
content authors need to URL encode values they write into
wsrp-rewrite?/wsrp-rewrite
Anyone
trying to use consumer rewriting to solve (a) will run into (b) and content
authors will be forced to URL encode <em> text that they may be
writing by hand </em> (d).
A
radical though:
How
about modifying our separator char for consumer re-writes to be e.g. "/" and
allowing only safe chars for all wsrp-token values? This would remove any
mention of URL encoding from our URL rewriting altogether. That would give
simpler rules and solve (b) and (d)?
regards,
Andre
PS. I
did not express my lack of concern with (a) very well on Wed. Tools that want to
manipulate valid XML can use elements to represent re-writes, such as
<wsrp-rewrite
type="namespace"><wsrp-token>someName</wsrp-token></wsrp-rewite>,
internally and serialize this as text using a custom printer or run it
through a translation (by XSLT).
Is another way of looking at this to say that there are Consumer
platforms that are unable to support certain URL encoded characters in the
path portion of the URL. When such a Consumer uses a portlet that has
specified usesMethodGet=true, the Consumer will be unable to operate in a
stateless manner because it will be unable to push all portlet URL parameter
values into the path portion of the URL?
Rich Thompson
| Andre Kramer
<andre.kramer@eu.citrix.com>
05/16/2003 09:16 AM
| To:
"'David Ward'" <david.ward@oracle.com>
cc:
wsrp-webservice@lists.oasis-open.org
Subject:
RE: [wsrp-webservice] encoding of items on
path urls |
see below. From my point of view, double encoding does not solve the
problem, so would be worse (more confusion) than the current url encode
advice.
-----Original
Message-----
From: David Ward
[mailto:david.ward@oracle.com]
Sent: 16 May 2003 14:06
To:
Andre Kramer
Cc:
wsrp-webservice@lists.oasis-open.org
Subject: Re: [wsrp-webservice]
encoding of items on path urls
Andre Kramer
wrote:
I agree that passing a
query strings through a path is getting us into trouble but we have to do
something like that to make forms with method="get" work.
I was assuming
that all urls would be in UTF-8 so that we can assume that char sets for all
wsrp token values is going to be UTF-8 in effect. Then we can just translate
to binHex.
For consumer writing, we can URL encode or convert to binHex
as the consumer should protect itself if it needs more than url encoding. But
don't forget the consumer side decode...
That's what
I'm asking about! I presume the consumer can still assume UTF-8 URL encoding
when it decodes during a rewrite.
[Andre Kramer] We certainly can do this but the consumer may
have to undo an unneeded URL encode.
And double encoding does have issues
with .NET and IMHO would lead to much confusion.
What
are these issues?
[Andre Kramer]
The main issue is that %s are not working in paths. They break .NET 2003 even
with double encoding (which does not remove the % char).
In the end, the real question for me is if we can dictate that all uri
schemes will be "URL encoding friendly" in all places on the uri that wsrp
token values could be written into. I now think the answer is "no":
e.g.
http://{wsrp-mode}.someFancyDNS/ with custom mode "pay%80" for {wsrp-mode}
Inventing a uri scheme would lead to a
longer but less far fetched example.
Don't know what you mean. Doesn't the fact that URL encoding
results in URL safe characters means that it should be possible to put these
characters anywhere in a URL?
[Andre Kramer] Then so does binHex encoding.
regards,
Andre
-----Original
Message-----
From: David Ward [mailto:david.ward@oracle.com]
Sent: 16 May 2003 13:32
To: Andre
Kramer
Cc: wsrp-webservice@lists.oasis-open.org
Subject: Re: [wsrp-webservice] encoding of items
on path urls
Hi Andre
I have a few comments
inline
Regards
David
Andre Kramer wrote:
On the way back from the f2f, I did some more testing of what chars
can't be
used inside b in http://x/a=b/c=d
style urls.
Both runs of /s and \s are collapsed to a single "/".
This is expected to
be a common problem and occurs even if the /s and \s
are url encoded.
Even when url encoded is used (? as %3f etc), the
following quoted chars
caused problems for .NET 2003: "
%&'*:><" [not the quotes but including
space] There may be
work-arounds but we should work out of the box for
ASP.NET etc.
The
following are OK (some need to be URL
encoded):
0123456789abcdefABCDEF_-=?#,!£^(){}[]+`¬\"+;|.,~$@
Given that:
1) the above seems arbitrary and who knows what
other problems we may find
for other Web servers and portal URL
construction schemes.
Given all this, it sounds
like trying to pass a query string through a path is a bad idea!
2) the consumer's Web Server does one URL decode on the
way in, so the
producer should not decode (again). This is likely to cause
much confusion.
I disagree - that's how query
strings have always worked in servlets!
3) real
world implementations will add a digital signature to
navigationalState and
interactionState and encrypt, so that consumers (and
Web browsers) can't
read or modify a Portlet's state.
In effect, the URI scheme used by
the consumer (and the encoding scheme) is
unknown (it could be a
yet-to-be-invented URI type not using % url style
encoding), so the
conservative advice from
http://www.w3.org/Addressing/rfc1738.txt should be to only make use of
non-reserved chars known to be
always safe, independent of the encoding
scheme:
Yes - but when you URL encode, that's exactly what you are doing! You
are encoding everything using non-reserved characters. It's just that .NET is
misbehaving and modifying the decoded characters!
<quote>Thus, only alphanumerics, the special characters
"$-_.+!*'(),", and
reserved characters used for their reserved purposes may
be used unencoded
within a URL. </quote>
i.e. alphanumerics and
the special characters "$-_.+!*'(),"
":" is already used by us for
"wsrp:normal" etc so we may have to accept it
also.
Then my
advice is that we should limit simple url tokens to
alphaNumeric
(0..9,a..Z,A..Z) with ":_-" as separators and require that
complex values,
such as nav state be binHex encoded.
By binHex I
mean the XML Schema http://www.w3.org/TR/xmlschema-2/#hexBinary
xs:hexBinary type and we could change our wsdl to reflect
this.
Using binHex on its own isn't safe for
multibyte character sets - binHex can reliably encode bytes, but you would
have to also use a well-defined transformation from characters into bytes,
e.g. UTF-8 encoding.
However, this introduces a *2
in length, so producers may like to limit url
state to alphaNumeric. This
means the public wsdl should use xs:string but a
private (producer) wsdl
could change these to xs:hexBinary so that decoding
is
automatic.
In summary, WSRP can not dictate the URI scheme (or
encoding scheme) used by
a consumer so only <quote>alphanumerics, the
special characters
"$-_.+!*'(),", </quote> are guaranteed to be
safe wrt RFC1738.txt. XML
Schema's xs:hexBinary is recommended for
non-alphanumeric wsrp url token
values. ":" is used for wsrp
constants.
What do people on this SC think, before taking such a
proposal to the larger
group?
I trust you're
only talking about producer-side URL-rewriting here and that tokenized URLs
for consumer-side rewriting will still use regular UTF-8 URL
encoding.
Another alternative is to just 'double URL encode' the
'complex' values - that way every byte that is already URL safe can be passed
straight through as a single byte, and every non-URL safe byte will be encoded
as %37xx, where xx is the hex encoding of the byte value (and %37 is a URL
encoded '%'!). This may sound daft, but you might find that in a lot of
implementations, navigational state and interaction state are themselves in
query string form, meaning that when they are passed as a single value in a
query string, URL unsafe characters will in effect be 'double encoded'. If you
binHex everything, then the size of the URL is really going to balloon, and
you may quickly reach the operational limits for URLs in browsers, caches,
etc.
regards,
Andre
--
David Ward Principal Software
Engineer Oracle Portal
| Oracle European Development Centre 520
Oracle Parkway Thames Valley Park Reading Berkshire RG6
1RA UK
|
|
--
David Ward Principal Software
Engineer Oracle Portal
| Oracle European Development Centre 520
Oracle Parkway Thames Valley Park Reading Berkshire RG6
1RA UK
|
|
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]