OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cmis message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: [OASIS Issue Tracker] Commented: (CMIS-580) CONTAINS escaping needs additional clarification


    [ http://tools.oasis-open.org/issues/browse/CMIS-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=28298#action_28298 ] 

Jane Doong commented on CMIS-580:
---------------------------------

This JIRA issue is to update V1.1 section 2.1.12 Query, to use double quotes for full text search of phrases.  The changes were approved for escaping characters (CMIS-580) and using double quotes for phrases (CMIS-660) , but did not get into the V1.0 errata.  

I am going to update and clarify the necessary changes to be applied to the V1.1 draft.

---------------
1.	Change 2.1.12.2.1 BNF Grammar, using Florent's CMIS-580 proposal. His proposal not only added double quotes for phrases, but also listed how to escape a single quote character with another single quote. I would like to amend the proposal to add back 'backslash-single quote' as a valid word element, to be backwards compatible with V1.0.

Proposal: 
In BNF, change: 
  <word> ::= <word element> {<word element>} 
  <phrase> ::= <double quote> <word> {<space> <word>} <double quote> 
And add: 
  <double quote> ::= " !! U+0022 
  <backslash> ::= \ !! U+005C
  <quote symbol> ::= <quote><quote> | <backslash><quote>
  <word element> ::= <char> - <space char> - <quote> - <double quote> | <quote symbol> 
And remove the now unused: 
  <non space char>

------------------
2. Change 2.1.12.2.4.6 CONTAINS() predicate function, to use double quotes for phrases:

"A phrase is defined as a word or group of words. A group of words
must be surrounded by <insert>double</insert> quotes to be considered a single phrase."

------------------
3. Change 2.1.12.2.4.6 CONTAINS() predicate function, to allow using doubled up single quotes to escape a single quote.  The new sentence added here is duplicated from the 2.1.12.3 Escaping.  Basically we are allowing both backslash and single quote characters to escape the single quote now.

"Within a word or phrase, each (single-)quote must also be escaped by a preceding
backslash "\".

<insert>Using double single-quotes (") as a SQL-92 way to escape a literal single-quote (') character SHOULD BE supported as an allowable alternative to the double character \'.</insert>"

----------------
4. Remove 2.1.12.3 Escaping, the 2nd example which was to show how to escape single quoted phrases, which is not needed anymore, since phrases are enclosed in double quotes and does not need to be escaped.   More escaping examples may be added, but in the context of this JIRA issue, this example is no longer valid.

<remove>...SELECT ... FROM ... WHERE ... CONTAINS('\'Content Management\") ...</remove>


> CONTAINS escaping needs additional clarification
> ------------------------------------------------
>
>                 Key: CMIS-580
>                 URL: http://tools.oasis-open.org/issues/browse/CMIS-580
>             Project: OASIS Content Management Interoperability Services (CMIS) TC
>          Issue Type: Bug
>          Components: Domain Model
>    Affects Versions: Committee Draft 04, CD04 Substantial Changes
>            Reporter: Ryan McVeigh
>            Assignee: David Choy
>             Fix For: Committee Draft 05
>
>
> See CMIS-530 and CMIS-567 for additional details.
> The full text search string has two problems.  First, it uses internally a quoted string to delimit phrases.  Second, as there is now a special meaning for the quote, it needs an additional mechanism to escape this character.
> To avoid confusion (hopefully), and make parsing possible, I would propose that the phrase delimiter character be the double quote character (").
> This leaves us two options for escaping this character:
>   1) Don't do it - it is just not possible to search for a " within a word or phrase
>   2) Escape it in a similar way as is done with LIKE:  The sequences \" and \\ represent the single characters " and \, respectively.  All other uses of \ are an error.  An unescaped instance of " delimits a phrase.
> If we chose the single-quote character as the phrase delimiter, we get into the need to escape the escaped character, etc.  Choosing double quote as the delimiter makes the entire parsing of the text search expression orthogonal to what is normally done for escaping strings ('')).
> Note also that google, for example, uses a double-quote to delimit a phrase.  I think this usage of double quotes to delimit phrases in full text search is fairly common.
> I am not sure what Florent means (in CMIS-529) by "With unescaped content matching <text search expression>".  Although, as the current BNF uses <quote> to delimit a phrase, and I think some other issue may have reset <quote> to just a single quote - so this either means:
>    1) there is no escaping of '' done for this string (not sure how it will parse, in that case)
>    2) after escaping (being "unescaped"?) the text search expression is parsed from the string.  This still leaves the question of escaping the quote character (which adds two levels of escape parsing) .
> This behavior should be documented in section 2.10.2.4.3 "CONTAINS() predicate function" , so it is clear that the usage and escaping of double-quotes is specific to a string used as  the argument to this predicate.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]