OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

odata-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Query String parsing in URIs

I have a comment concerning the parsing of query strings from URIs in OData.

The problem is highlighted by an existing problem with the specification: the failure to address the issue of escaping the SQUOTE delimiter inside a string literal.

stringUriLiteral = SQUOTE [*characters] SQUOTE
characters = UTF8-char

The following question on stackoverflow illustrates this confusion:

The pragmatic solution which appears to work is to double the quote character as if the production was:
stringUriLiteral = SQUOTE [*characters] SQUOTE [stringUriLiteral]

I'd like to support this general approach and respectfully disagree with the view that the correct solution is to use percent-encoding.  My reasoning is that percent-encoding should belong to a different level of the parser and to mix it up with the parsing of OData constructs risks problems with double-encodings and will inevitably lead to the ABNF for OData become more complicated than necessary.

Indeed, a look at the following trunk (which is public but is presumably being used internally by the TC):


suggests that this is exactly what is happening.  Query strings in OData URIs are becoming opaque to systems that understand the ordinary parsing of parameter name/value pairs (e.g., from HTML forms).  This just makes extra work for OData libraries and is likely to lead to confusion and double-encoding.  It also prevents the same rules from being used for parsing literals from other contexts (such as from XML or JSON documents) where a lower level parser will have already transformed data into strings of unicode characters and removed any base encoding.

My proposal is that the query component of the URI is parsed according to the rules of the application/x-www-form-urlencoded type defined by HTML with the additional interpretation that the resulting byte-sequences are treated as UTF-8 encoded as per the guidelines for IRIs.  This will result in a widely supported mapping from parameter names to (a list of) parameter values as strings of unicode characters.

For compatibility, URIs with query strings in which unreserved (by RFC 1738) characters are passed unescaped (such as SQUOTE for example) would be permitted and not raise errors.  This is in keeping with the behaviour of most web frameworks and browsers anyway.

So the following URL (which currently resolves to information about the actor Peter O'Toole in the Netflix catalog):


Would be converted to a parameter dictionary containing an entry like the following:

$filter -> Name eq 'Peter O''Toole'

The OData rules would then concern themselves with parsing this value and the double-SQUOTE rule would come in to effect.

If percent encoding was used then there is significant risk that the URL might appear as follows (not the encoding of O'Toole:


It is quite likely that in many applications developers will attempt to reuse a built-in library for parsing the query string rather than following the OData specific query string parsing rules.  Such applications will end up with a dictionary containing the following entry:

$filter -> Name eq 'Peter O'Toole'

Which would cause an error as it doesn't satisfy the string production.  I'm not holding up the Netflix application as a reference implementation but note that the second form of the URL does not work against their current implementation and returns:

<error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">
<message xml:lang="en-US">Syntax error at position 22.</message>

Steve Lay

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]