[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [cmis] Groups - cmis-spec-v1.0e1-draft.doc uploaded
Given the confusion even within the TC, we certainly need to
include in the Errata a clarification on character escaping for text search
expression. I suppose we all agreed to de-coupling the text search
expression grammar from the SQL query grammar, as articulated by Florent in
#660. Since a text search expression is embedded in a SQL query
statement, there are indeed logically two levels of escaping. In reality, there may be additional levels of escaping if a
query statement is further imbedded in yet another string, depending on the protocol
binding and messaging underneath. Nevertheless, from a CMIS data model perspective, we can assume a
complete query statement is delivered to a server. The question is what character
escaping is needed to allow the server to parse the statement correctly/unambiguously. Logically, the server must first parse the query statement and
extract the text search expression correctly. Then it must parse the text search
expression correctly in order to perform a correct search. While parsing is
done outside-in, composition is done inside-out. At step 1, an application has
to first compose an unambiguous text search expression. Given the text search grammar
in v1.0, we must pick a specific escape character for that (sub)grammar, and mandate
escaping for any occurrence of three characters in a text search expression: ‘-‘,
<quote>, and the escape character. At step 2, the application encloses this
expression in <quote>’s and embed it in CONTAINS() to build a complete
query statement. For the SQL-level grammar, we should use the SQL escape
character (unless we want to change it or to allow an alternative escape
character). In order for a SQL-level parser to correctly extract the text
search expression, we need a second level escaping for any occurrence of two
characters, <quote> and the SQL escape character, in an expression
produced by step 1. If we allow an alternative escape character at the SQL
level, then any occurrence of that character in the expression produced by step
1 must be escaped as well. This design seemed necessary and sufficient. A SQL-level parser would
peel off the step 2 escapes to correctly extract a text search expression. Then
a text search parser would peel off the step 1 escapes to correctly interpret
the text search request. In any case, escaping must be precisely defined and mandated to
avoid ambiguity. Thoughts? David From: Steve Roth
[mailto:steve.roth@oracle.com] Yes, the syntax change from single to double-quotes was
excluded because it would break compatibility (BNF change). What we should clarify in the BNF (and what
was my intent all along when writing this stuff) is that the BNF for <text
search expression>, described after Unless others disagree, I think this should go into the 1.0
errata as well. Within a word or phrase, each
(single-)quote must also be escaped by a preceding backslash “\”
Repositories MUST
support the escaping of characters using a backslash (\) in the query
statement. This escaping applies specifically to the <character string
literal> (new text). The backslash character (\) will be used to escape
characters within quoted strings in the query as follows:
Do we really need to escape both words and character string
literals? It seems we certainly need to clarify if this is the case
or not. Thanks
Ryan for pointing back to the source, this helps. Am I correct that we have
excluded a syntax change for the errata, because this would break
compatibility? This means we still have to follow the double escaping rules. I
think beside the example some more text would be helpful mentioning that: For CONTAINS double escaping needs
to be performed (CMIS escaping for text search expressions according to
2.1.10.3 and SQL escaping to get a valid character string literal) We should state the order in which
both escapings have to be performed (e.g. first CMIS escaping, then SQL
escaping) Have the example If
I follow these rules I come to the following conclusion: That’s
-> That\’s // CMIS Escaping rule by
backslashes according to section 2.1.10.3 of the spec That’\s
-> That\’’s // SQL escaping rule, double a
single quote That\’’s
-> ‘That\’’s ‘ // put single quotes around it
to get a valid SQL character string literal Where
is the point where I am wrong? Looking
back at CMIS-660 It seems that we first discussed a syntax change that we later
have postponed to a later spec revision. It might be possible that we still
have some uncertainties what we actually see in place for 1.0. Jens From: Ryan McVeigh [mailto:rmcveigh@ziaconsulting.com]
Jens, This was taken from the comments of CMIS-660, specifically
Florent's comment. If the doc isn't clear yet, we haven't successfully
generated our errata. :) Let me know if after reading Florent's
comment if this makes sense and how we can further clarify. Thanks, -Ryan On Wed, Jan 19, 2011 at 10:41 AM, Jens Hübel <jhuebel@opentext.com> wrote: I have a question regarding
|
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]