OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

cmis message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: [OASIS Issue Tracker] Created: (CMIS-580) CONTAINS escaping needsadditional clarification

CONTAINS escaping needs additional clarification

                 Key: CMIS-580
                 URL: http://tools.oasis-open.org/issues/browse/CMIS-580
             Project: OASIS Content Management Interoperability Services (CMIS) TC
          Issue Type: Bug
          Components: Domain Model
    Affects Versions: Committee Draft 04
            Reporter: Ryan McVeigh
            Assignee: Ethan Gur-esh

See CMIS-530 and CMIS-567 for additional details.

The full text search string has two problems.  First, it uses internally a quoted string to delimit phrases.  Second, as there is now a special meaning for the quote, it needs an additional mechanism to escape this character.

To avoid confusion (hopefully), and make parsing possible, I would propose that the phrase delimiter character be the double quote character (").

This leaves us two options for escaping this character:
  1) Don't do it - it is just not possible to search for a " within a word or phrase
  2) Escape it in a similar way as is done with LIKE:  The sequences \" and \\ represent the single characters " and \, respectively.  All other uses of \ are an error.  An unescaped instance of " delimits a phrase.

If we chose the single-quote character as the phrase delimiter, we get into the need to escape the escaped character, etc.  Choosing double quote as the delimiter makes the entire parsing of the text search expression orthogonal to what is normally done for escaping strings ('')).

Note also that google, for example, uses a double-quote to delimit a phrase.  I think this usage of double quotes to delimit phrases in full text search is fairly common.

I am not sure what Florent means (in CMIS-529) by "With unescaped content matching <text search expression>".  Although, as the current BNF uses <quote> to delimit a phrase, and I think some other issue may have reset <quote> to just a single quote - so this either means:
   1) there is no escaping of '' done for this string (not sure how it will parse, in that case)
   2) after escaping (being "unescaped"?) the text search expression is parsed from the string.  This still leaves the question of escaping the quote character (which adds two levels of escape parsing) .

This behavior should be documented in section "CONTAINS() predicate function" , so it is clear that the usage and escaping of double-quotes is specific to a string used as  the argument to this predicate.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]