OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cmis message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [OASIS Issue Tracker] Commented: (CMIS-660) Clarification needed onthe use of quotes in a CONTAINS() query to search on phrases



    [ http://tools.oasis-open.org/issues/browse/CMIS-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=20024#action_20024 ] 

David Choy commented on CMIS-660:
---------------------------------

Thanks to the various suggestions, we now have at least 5 ideas to be resolved. Below is where we stand (apparently). Please feel free to agree/disagree. Hopefully a concensus can emerge.

(1) Do we want quotes around text-search-expression?
The TC seemed leaning towards keeping the quotes.

(2) Shall we put quotes around each word, treating it as a special case of a phrase?
This is an interesting idea. Any comment?

(3) Do we use single-quotes or double-quotes for a phrase? (This was the original issue for #660.)
So far, there was no objection to the proposed change from single-quotes to double-quotes. Also, no one argued that both single-quotes and double-quotes should be allowed. Unless we hear otherwise, I assume the majority are ok with the proposal.

(4) Character escape
Florent articulated very well that the fulltext grammar should be independent of the SQL-level grammar. If we accept that, then SQL-level escapes are needed to make sure that the CONTAINS() function and its individual arguments can be parsed correctly regardless of the fulltext syntax. (In that regard, we probably need to escape ",", "'", and ")" in the text-search-expression.) After SQL-level parsing, fulltext-level escapes are needed to assure correct parsing of the extracted text-search-expression (that is, after SQL-level escapes are removed or masked from the expression). In other words, SQL-level escape and fulltext escape are different beasts. If we agree to this, we need to describe it clearly in the spec.
To construct a query statement, the reverse is done: a grammatically correct text-search-expression is first written, using fulltext escapes if needed. Then SQL-level escapes are added to this expression (if necessary) before the marked-up expression is inserted into the CONTAINS() function. The resulting syntax may not be as easy to read (by a human user) as one would like, but syntactic correctness (for machine processing) is important in order to avoid ambiguity and assure interoperability.
Comments from the TC?

(5) Shall we adopt a subset of Lucene's syntax instead of inventing our own?
If we take this approach then we need to make sure the text-search-expression syntax (without SQL-level escapes) is indeed a proper subset of Lucene's, including escapes and the definiton of a phrase. This would also preempt any choice we make for (2), (3), and the fulltext portion of (4).
Comments?

> Clarification needed on the use of quotes in a CONTAINS() query to search on phrases
> ------------------------------------------------------------------------------------
>
>                 Key: CMIS-660
>                 URL: http://tools.oasis-open.org/issues/browse/CMIS-660
>             Project: OASIS Content Management Interoperability Services (CMIS) TC
>          Issue Type: Improvement
>          Components: Domain Model
>    Affects Versions: Draft 0.70
>            Reporter: Jane Doong
>            Assignee: Ethan Gur-esh
>            Priority: Minor
>
> Clarification needed on the use of quotes in a CONTAINS() query to search on phrases.
> Spec:
> 2575  BNF grammar structure:: CONTAINS ( [ <qualifier> ,] ' <text search expression> ' )
> 2413  <phrase> ::= <quote> <word> [ {<space> <word>} ... ] <quote>
> 2422  <quote> ::= "'" !! Single-quote only, consistent with SQL-92 string literal
> 2597  Within a word or phrase, each (single-)quote must also be escaped by a preceding backslash "\"
> The spec specifically states that <quote> is single-quote only.
> My questions is on specifying a phrase inside the CONTAINS(). 
> Since the entire text search expression is enclosed in single quotes, 
> I question whether a phrase should again be enclosed in single quotes, or should it be in double-quotes.
> According to spec:
>  Word search:    CONTAINS('house')
>  Phrase search : CONTAINS(' 'my house' ')
> Should phrases be in double-quotes? ==> CONTAINS(' "my house" ')

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]