OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cmis message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [cmis] Groups - cmis-spec-v1.0e1-draft.doc uploaded


Given the confusion even within the TC, we certainly need to include in the Errata a clarification on character escaping for text search expression.

 

I suppose we all agreed to de-coupling the text search expression grammar from the SQL query grammar, as articulated by Florent in #660.

Since a text search expression is embedded in a SQL query statement, there are indeed logically two levels of escaping.

In reality, there may be additional levels of escaping if a query statement is further imbedded in yet another string, depending on the protocol binding and messaging underneath.

Nevertheless, from a CMIS data model perspective, we can assume a complete query statement is delivered to a server. The question is what character escaping is needed to allow the server to parse the statement correctly/unambiguously.

 

Logically, the server must first parse the query statement and extract the text search expression correctly. Then it must parse the text search expression correctly in order to perform a correct search. While parsing is done outside-in, composition is done inside-out. At step 1, an application has to first compose an unambiguous text search expression. Given the text search grammar in v1.0, we must pick a specific escape character for that (sub)grammar, and mandate escaping for any occurrence of three characters in a text search expression: ‘-‘, <quote>, and the escape character. At step 2, the application encloses this expression in <quote>’s and embed it in CONTAINS() to build a complete query statement. For the SQL-level grammar, we should use the SQL escape character (unless we want to change it or to allow an alternative escape character). In order for a SQL-level parser to correctly extract the text search expression, we need a second level escaping for any occurrence of two characters, <quote> and the SQL escape character, in an expression produced by step 1. If we allow an alternative escape character at the SQL level, then any occurrence of that character in the expression produced by step 1 must be escaped as well.

 

This design seemed necessary and sufficient. A SQL-level parser would peel off the step 2 escapes to correctly extract a text search expression. Then a text search parser would peel off the step 1 escapes to correctly interpret the text search request.

In any case, escaping must be precisely defined and mandated to avoid ambiguity.

 

Thoughts?

 

David

 

From: Steve Roth [mailto:steve.roth@oracle.com]
Sent: Thursday, January 20, 2011 2:55 PM
To: Jens Hübel
Cc: Ryan McVeigh; florian.mueller@alfresco.com; cmis@lists.oasis-open.org
Subject: Re: [cmis] Groups - cmis-spec-v1.0e1-draft.doc uploaded

 

Yes, the syntax change from single to double-quotes was excluded because it would break compatibility (BNF change).  

I agree with Jens that it's a good idea to explicitly state the double-escaping and clarify the order.

I also agree with Florent's comment that

What we should clarify in the BNF (and what was my intent all along when writing this stuff) is that the BNF for <text search expression>, described after 
 !! This is full-text search criteria. 
is really the BNF for the text search expression WITH SQL-LEVEL ESCAPING REMOVED, so that the two grammars are not mixed. 

Unless others disagree, I think this should go into the 1.0 errata  as well.



I think the confusion here is that there are rules for both escaping words and escaping character string literals.   It you take it strictly, it sounds like you need to escape it twice.   I think that's the way Florent was reading this.

These are the rules which Florent originally pointed out (in a different order):

Regarding escaping: 
1)- 2.1.10.2.4.3 explains that if we want a quote to be part of a word, then it has to be backslash-escaped

Within a word or phrase, each (single-)quote must also be escaped by a preceding backslash “\”


2)- SQL says that <quote> is escaped as <quote><quote> inside SQL <character string literal>, 

3)- 2.1.10.3 says that <backslash> is also allowed (required) for escaping inside quoted strings, i.e., inside <character string literal>,

  Repositories MUST support the escaping of characters using a backslash (\) in the query statement. This escaping applies specifically to the <character string literal> (new text). The backslash character (\) will be used to escape characters within quoted strings in the query as follows:


Which results in

That’s  -> That\’s     // in a word, quotes must be escaped with a preceding backslash per rule 1


That\'s -> That\’’s    // SQL escaping rule, double a single quote per rule 2


That\’’s   -> 'That\\’’s'  // character string literal escaping per rule 3

 

Do we really need to escape both words and character string literals?   It seems we certainly need to clarify if this is the case or not.


-Steve


On 01/20/2011 06:06 AM, Jens Hübel wrote:

Thanks Ryan for pointing back to the source, this helps. Am I correct that we have excluded a syntax change for the errata, because this would break compatibility? This means we still have to follow the double escaping rules. I think beside the example some more text would be helpful mentioning that:

For CONTAINS double escaping needs to be performed (CMIS escaping for text search expressions according to 2.1.10.3 and SQL escaping to get a valid character string literal)

We should state the order in which both escapings have to be performed (e.g. first CMIS escaping, then SQL escaping)

Have the example

 

If I follow these rules I come to the following conclusion:

That’s  -> That\’s     // CMIS Escaping rule by backslashes according to section 2.1.10.3 of the spec

That’\s -> That\’’s   // SQL escaping  rule, double a single quote

That\’’s   -> ‘That\’’s ‘  // put single quotes around it to  get a valid SQL character string literal

 

Where is the point where I am wrong?

 

Looking back at CMIS-660 It seems that we first discussed a syntax change that we later have postponed to a later spec revision. It might be possible that we still have some uncertainties what we actually see in place for 1.0.

 

Jens

 

 

From: Ryan McVeigh [mailto:rmcveigh@ziaconsulting.com]
Sent: Mittwoch, 19. Januar 2011 18:45
To: Jens Hübel
Cc: florian.mueller@alfresco.com; cmis@lists.oasis-open.org
Subject: Re: [cmis] Groups - cmis-spec-v1.0e1-draft.doc uploaded

 

Jens,

 

This was taken from the comments of CMIS-660, specifically Florent's comment.  If the doc isn't clear yet, we haven't successfully generated our errata.  :)  Let me know if after reading Florent's comment if this makes sense and how we can further clarify.

 

Thanks,

 

-Ryan

On Wed, Jan 19, 2011 at 10:41 AM, Jens Hübel <jhuebel@opentext.com> wrote:

I have a question regarding

2.20 CMIS-697: Word plus phrase example for CONTAINS query

o       Ultimately yielding a query with:  AND CONTAINS(‘that\\\’s’)

Can someone explain please why we need to escape the backslash? I can't find any rule like this in the SQL-92 syntax.

Thanks Jens


-----Original Message-----
From: florian.mueller@alfresco.com [mailto:florian.mueller@alfresco.com]
Sent: Mittwoch, 19. Januar 2011 16:26
To: cmis@lists.oasis-open.org
Subject: [cmis] Groups - cmis-spec-v1.0e1-draft.doc uploaded

The document revision named cmis-spec-v1.0e1-draft.doc has been submitted
by Mr. Florian Mueller to the OASIS Content Management Interoperability
Services (CMIS) TC document repository.  This document is revision #1 of
cmis-spec-v1.0e1-draft.doc.

Document Description:


View Document Details:
http://www.oasis-open.org/committees/document.php?document_id=40811

Download Document:
http://www.oasis-open.org/committees/download.php/40811/cmis-spec-v1.0e1-draft.doc

Revision:
This document is revision #1 of cmis-spec-v1.0e1-draft.doc.  The document
details page referenced above will show the complete revision history.


PLEASE NOTE:  If the above links do not work for you, your email application
may be breaking the link into two pieces.  You may be able to copy and paste
the entire link address into the address field of your web browser.

-OASIS Open Administration




--
Ryan McVeigh
Director of Enterprise Integration
Image removed by sender.
office: 303.443.4004 x204
cell: 720.841.4838
fax: 877.569.7942
Follow Me:
Image removed by sender.Image removed by sender.Image removed by sender.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]