search-ws message

Subject: Re: [search-ws] Groups - Contextual Query Language (cql-first-draft-april-6-2009.doc) uploaded
From: "Ray Denenberg, Library of Congress" <rden@loc.gov>
To: <search-ws@lists.oasis-open.org>
Date: Mon, 20 Apr 2009 16:11:46 -0400
Thanks Nick for the comments.  I've gone through them all and will issue a 
new draft soon.  Comments are discussed below, those not covered were 
straightforward and have been addressed in the next draft.   --Ray

----- Original Message ----- 
From: "Nick Nicholas" <opoudjis@optushome.com.au>
To: <search-ws@lists.oasis-open.org>
Sent: Monday, April 13, 2009 2:36 AM
Subject: Re: [search-ws] Groups - Contextual Query Language 
(cql-first-draft-april-6-2009.doc) uploaded


> 76: " The empty search term [example d] has no defined semantics."

Eliminated this prose and the spec is simply silent on the matter.


> 56: Context sets are repeatedly mentioned in the document, but are not 
> introduced until line 193. This is confusing, and I don't see why 2.3 
> can't move as is to the start of the document.

I have completely reworked the relevant parts of the spec to address this.


> 86: "If multiple '.' characters are present, then the first should be 
> treated as the prefix/base name delimiter". This means that a context  set 
> name cannot start with a dot?

Right. Why would you want it to?


>
> 87: "If the prefix is not supplied, it is determined by the server".  Say 
> explicitly it is cql.anyIndexes.  Why is there a distinction  between 
> cql.anyIndexes and cql.serverChoice?

For discussion, next call.


> 131: "the prox operator is". Sentence incomplete. (I'm not really 
> surprised... :-) Forward reference to 2.1.9?
>
> 149: "Within the CQL set they [proximity terms] are explicitly  undefined, 
> subject to interpretation by the server." How can I find  out how the 
> server has chosen to interpret them? I find the refusal to  define their 
> behaviour a concern. People will make the obvious  orthographically-based 
> assumptions about the meaning of word,  sentence, paragraph, and will not 
> be happy if that is not what's  implemented.

I think the idea is that if you want a specific interpretation, use a 
context set. If we want the server to be able to explain how it interprets 
them, it could be part of explain. However I think the idea, further, is 
that the server's interpretation could depend on certain apsects of the 
query, this different interpretations for different queries.  Thus hard to 
explain.

For discussion at next call.

> BNF: comparator, not comparitor
I'll leave this for whoever volunteers to work on the BNF (we had a big 
debate on "comparitor" vs. "comparator" last time we discussed the BNF.)

> 203: "When defining a new context set, it is necessary to provide a 
> description of the semantics of each item within it". No minimum 
> requirements for this description are provided.

I've deleted the sentence.


> 234: "cql.resultSetId = "a" AND cql.resultSetId = "b" " I'm surprised 
> this works, since the instance of the record is unique to a result  set. 
> Does the wording of the response set data model explicitly  license such 
> result set manipulation?

This assumes that cql is being used with a protocol that declares a result 
set model. I've added a note to that effect.


> 244: "allIndexes". Remind readers that this is not equivalent to a  full 
> text search.

For discussion at next call.  I'm not sure it wouldn't be reasonable for a 
server to treat this as full text search.


> 258: "keywords". Note that the search terms in the keywords index need 
> not be present in any other defined index.

I agree, however it says "Exactly which fields make up this index is 
determined by the server, " which implies that the index is constructed from 
other indexes.

For discussion at next call.


> 271. "=". How can I find out how a server has implemented "="?

From Explain (which Ralph and Janifer are working on).


>
> 284. "==". Remind readers that CQL does not strip whitespace, so the 
> index better had.

For discussion at next call.



> 301. "<" etc. I would insert a textual comparison example, since  textual 
> comparisons are defined (subject to the locale), and  comparisons are not 
> limited to numbers.

I'm not sure that the CQL set is where lexical relations should be defined. 
We had discussion of this a couple years ago and there was mention of a 
"lexical" context set.

For discussion at next call.


>
> 311. "adj". You've dodged any mention of word delimiters in the  adjacency 
> definition, but clearly adjacency is meaningless without the  notion of a 
> delimiter. The delimiter, again, is determined by the  locale.

The delimiter is intended to be understood to be space.  We could add a 
relation modifier "delimiter=xx"
For discussion.

>
> 352. "stem". Being the same stem is different from being the same  lemma, 
> and often lemma is what you actually want (e.g. "computer" and 
> "computers" but not "computing".) I'd make the distinction here ---  
> especially for languages not as morphology-poor as English.

For discussion.


>
> 362. "partial". Word fragments could also usefully be searched in  normal 
> searches.

Example, please.


>
> 375. "locale=value". Rather than giving illustrative examples, say  that 
> locales are used as understood under Unix, and refer to a more  canonical 
> listing (in whatever gizzards of BSD or Java that might  reside). It'd be 
> nice if you could move away from "C" as a locale and  just used ISO, but 
> that's a big ask....

Someone else can write this section since I don't understand it.

>
> 495. "container=field" sits clumsily with how indexes are normally 
> defined. Is a query like "author = jack prox author = jones" well- 
> defined? Is "author = jack prox title = jones"? (It shouldn't be.) Is 
> "author = jack prox/container=title author = jones"? (Again it  shouldn't 
> be).

The latter two are not well-defined, and the first is debatable.  But the 
examples, are well-defined, I think:
             name=jones prox/container=author date=1950
Find the name 'jones' and date '1950' in the same author field.
The semantics of this query are clear, and it is up to the server to 
determine how to process it.  It doesn't mean that there necessarily are 
name and date subfields within the author field, though there may be, or the 
server applies some algorithm to determine what is the date and what is the 
name.
References:
- Re: [search-ws] Groups - Contextual Query Language (cql-first-draft-april-6-2009.doc) uploaded
  - From: Nick Nicholas <opoudjis@optushome.com.au>