search-ws message

Subject: RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing

From: "Hammond, Tony" <t.hammond@nature.com>
To: "Matthew Dovey" <m.dovey@jisc.ac.uk>,"Ray Denenberg, Library of Congress" <rden@loc.gov>,"LeVan,Ralph" <levan@oclc.org>,"OASIS SWS TC" <search-ws@lists.oasis-open.org>
Date: Tue, 14 Dec 2010 09:06:15 -0000

Title: RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing

Hi Matthew:

Thanks for your comments. Have to say that I agree with your point a) assigning new query type to a language subset (not unlike rdf/turtle being subset of rdf/n3), but am less convinced about your point b) which has to do with presentation. In my opinion presentation (or wire encoding) is purely an SRU function and has nothing to do with CQL. SRU is the protocol whose job is to get the query across. Whether the query travels wholesale through a single "query" parameter or is split up over mutiple querylets (can I use that word? :) as signalled by a "queryn" parameter does not affect the query itself.

Now I grant you that in present proposal with a strict subset the we have changed the query language. (In the earlier proposal with an arbitrary fragmentation and recombination there was no impact on CQL and thus I don;t believe that would have changed the query type. But for reasons of omitting empty terms we need to move from an arbitrary fragmentation to a clause-related fragmentation.) So, in respect of a) I am willing to concede that query type is affected.

As regards inelegance, for sure. But I can't see any other way to do it. I have now modified proposals on our side to use a more minimal parameter naming syntax which is more Google-esque. So previously I had considered

queyrn=2, q1.idx = ..., q1.rel = ..., q1.trm = ..., q1.bln = ...,
          q2.idx = ..., q2.rel = ..., q2.trm = ..., q2.bln = ...

In interests of minimizing querystring length I would now porpose the following:

queyrn=2, qi1 = ..., qr1 = ..., qt1 = ..., qb1 = ...,
          qi2 = ..., qr2 = ..., qt2 = ..., qb2 = ...

This could be seen as being more cryptic but I believe the consistency and compactness of naming compensates for the verbatim naming earlier. (Remember we are talking about URI querystrings and not XML documents and brevity is a virtue in this space.)

One possible normalization that could be applied would be to remove the booleans and have only a default boolean which would be applied between the clauses. This would the merit of removing the trailing boolean and also of abbreviating the query length but the distinct demerit of not allowing boolean relationships to be expressed. And in the vanilla search form at

    http://www.nature.com/opensearch/request

which uses this fragmentation approach we do have support for boolean operators.

So, I would suggest to retain the boolean operator even though it would bulk out the queries.

Further if this proposal were taken any further I would suggest that even if the queryType were changed then it would already be implied by the queryn parameter and would not need to be formally expressed.

That's as far as I can get with this at present. I do need to apply something similar to our own search forms. It would have been good to have employed a standard or best practice approach. At least this CQL Sequencing proposal has been an attempt to make the input query syntax - as mediated by a web form - conformant with SRU/CQL.

Tony

-----Original Message-----
From: Matthew Dovey [mailto:m.dovey@jisc.ac.uk]
Sent: Mon 12/13/2010 12:17 PM
To: Hammond, Tony; Ray Denenberg, Library of Congress; LeVan,Ralph; OASIS SWS TC
Subject: RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing

I've not problem seeing this as a different queryType. Let's face it, the differences between type-1 and type-101 queries in Z39.50 were far less than the differences here, yet we had no problem defining them as different types.

In this case

a) we have a strict subset of CQL (i.e. there will be CQL queries which cannot be expressed this way)
b) the syntax (by which I mean the on the wire encoding) is different, even through the semantics are identical to (the subset of) CQL

Both of those justify, in my mind, viewing this as a different query type - even though the semantics are derived from CQL and there is an easy mapping into CQL (for that matter, I regarded CQL and XCQL as different query types - albeit strongly related).

As regards the proposal itself - I'm ambivalent. I like the use-case and I accept the arguments that we need to support this case. I don't like the solution proposed however - I can't help thinking that there must be a more elegant way of solving this; but I don't have an alternative at the moment.

I'll keep thinking on it though.

Matthew

-----Original Message-----
From: Hammond, Tony [mailto:t.hammond@nature.com]
Sent: 08 December 2010 18:12
To: Ray Denenberg, Library of Congress; LeVan,Ralph; OASIS SWS TC
Subject: RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing

Nope. I was talking about CQL (or a proper subset of CQL). I am not proposing another query syntax - but a technique for handling simple CQL as would commonly be generated from search web forms which is how most end users will interact with web-based search engines.

This is where I begin to worry about CQL/SRU. If it won't walk the web way it'll likely end up not walking very far. The web ultimately is a juggernaut. It won't care about niceties like constructing honed queries and inserting them just so.

Tony

-----Original Message-----
From: Ray Denenberg, Library of Congress [mailto:rden@loc.gov]
Sent: Wed 12/8/2010 4:47 PM
To: Hammond, Tony; Denenberg, Ray; 'LeVan,Ralph'; 'OASIS SWS TC'
Subject: RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing

Yes I mean use the queryType parameter and define a new query type, where the new query is split into parameters along the lines you suggest. This would be a much less disruptive approach than to introduce new parameters
into the protocol.    But no, it's not "still CQL" striclty speaking, it
would be a cql query transformed into a different syntax. As far as indicating the number of rows, that could be part of the query, couldn't it?

--Ray

From: Hammond, Tony [mailto:t.hammond@nature.com]
Sent: Wednesday, December 08, 2010 11:20 AM
To: Denenberg, Ray; LeVan,Ralph; OASIS SWS TC
Subject: RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing

Interesting idea. You mean to use the "queryType" parameter? It's still CQL right, although fractured. So I guess one could take advantage of that param to indicate that state of affairs instead of using a new param. But then one would still want to communicate the number of search clauses (rows) being generated. Where would that fit in?

My take was that "queryn" would provide the indicator (and thus proxy for "query" parameter), as well as providing the number of search clauses.

Tony

-----Original Message-----
From: Ray Denenberg, Library of Congress [mailto:rden@loc.gov]
Sent: Wed 12/8/2010 3:41 PM
To: Hammond, Tony; 'LeVan,Ralph'; Denenberg, Ray; 'OASIS SWS TC'
Subject: RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing

Tony - could this be done instead as a separate query type?     --Ray

From: Hammond, Tony [mailto:t.hammond@nature.com]
Sent: Wednesday, December 08, 2010 3:45 AM
To: LeVan,Ralph; Denenberg, Ray; OASIS SWS TC
Subject: RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing

Hi:

Before we dismiss this proposal out of hand a couple more word are in order:

> I'm not excited by added this. It's a very uninteresting class of
> forms
clients that can't do javascript.

We offer a commercial search service on our platform based on OpenSearch. We do use JavaScript liberally on the platform but cannot assume users will choose to do so. We therefore *require* a failover to server-side technologies. It's that simple.

> The only way they would appear that way to the server is if the user
filled in all those fields in the form.

This is not the case. As I mentioned in my earlier message postscript the initial consideration was to fragment the CQL strings arbitrarily and then to recombine in strict sequence order. However, if term values were not supplied the resulting CQL string would be invalid.

Instead I modified this proposal to use a matrix approach whereby search clauses would be numbered sequentially - as rows - and the individual components (index, relation, term, boolean) would be the columns. Hence any search clause with an empty term value could be skipped entirely. And the terminal boolean is also always to be omitted from the reckoning. Index, relation and boolean values are supplied by the form. Only the term values are entered by a user and only those associated search clauses are ever considered.

We know that this approach works. This is what we currently use in the JavScript used by our forms handler which comes with the "explainReponse.xsl" stylesheet that ships with the "oclcsrw" package. (Btw, must say the developer has done us proud here. Excellent job!) The only difference here is that I have amended the naming somewhat and also the intent. For naming I have suggested "q{n}.idx" for "index{n}", "q{n}.rel"
for "relat{n}", etc. I have also proposed "queryn" for a user suppled value instead of the dynamically computed "maxItems", since this accords better with "query". (Recall too that "query" - and "queryn" if adopted - is the signal for a "searchRetrieve" operation.)

The intent too is different in that the current JavaScript is destined for one form only whereas this approach is proposed as a general method to be used by web forms for client-side reassembly using JavaScript or failing over to server-side reassembly. Most search web forms fit to this simple type of matrix description however the presentation is crafted.

One of the main problems with SRU adoption is the difficulty of constructing the CQL querystring which must be presented intact. I cannot emphasize this point enough. SRU currently does not allow for fragmented CQL querystrings which are what a forms interface naturally provides for and which is the primary means for an end user to interact with an SRU endpoint. Also even if fragmented query components were supported there is still the difficulty of reconciling the CQL triple (index, relation, term) with the basic key/value pair. (Other query languages pay less heed to relations and so map more readily index and term to key/value pairings.)

This proposal could be accommodated within a non-normative annex as a general technique for dealing with web forms. However if there is no obligation on a server to recognize this technique then it cannot be safely relied upon and so must necessarily limit the range of clients that SRU can (or is willing) to support. It may be that SRU will only support JavaScript enabled clients.

We ought to be worried.

Tony

ps/
I had this all written out much better (in English too) but lost the whole text and had to rewrite.

-----Original Message-----
From: LeVan,Ralph [mailto:levan@oclc.org]
Sent: Wed 12/8/2010 5:26 AM
To: Hammond, Tony; Ray Denenberg, Library of Congress; OASIS SWS TC
Subject: RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing

I'm not excited by added this. It's a very uninteresting class of forms clients that can't do javascript.

But, mostly I don't think it works. The google example uses unrelated parameter names for the parts of the query. In Tony's example, the parts are numbered sequentially. The only way they would appear that way to the server is if the user filled in all those fields in the form.
What if fields are omitted? Those fields, with their sequential names) would not be sent. We'd have gaps in the numbering. What if there were more Booleans than operands?

Let's not.

Ralph

From: Hammond, Tony [mailto:t.hammond@nature.com]
Sent: Tuesday, December 07, 2010 11:30 AM
To: Ray Denenberg, Library of Congress; OASIS SWS TC
Subject: [search-ws] queryn: A proposal for SRU to facilitate forms processing

Hi:

I wanted to put this (modedst) proposal for SRU forward and get some feedback.

One of the differences between SRU and other general search interfaces is that the actual query (CQL string) is contained within a single parameter and not scattered across several parameters, as e.g. this search in Google:

http://www.google.co.uk/search?q=this+-that=en=10==i=countryAU=images=qd
r:w

This is a query for "this" and not "that" in Australian sites in the past week.

&q=this+-that
&cr=countryAU
&tbs=qdr:w

Yep, it's a bit of a mess. :) Mixes together query and control params.
But still it's straightforward to map to from a forms interface. I always think of traditional query interfaces as being 1-D and SRU as being 2-D: one dimension for query, and the other for control. And this separation of concerns is both a blessing and a curse. A curse especially for implementors.

Now one of the difficulties with a forms input for SRU is that the CQL query needs to be composed before it is added to the querystring as a single parameter which usually means some clever stylesheet handling of the query fields (which we are currently using from the oclcsrw package) or some other preprocessing method.

I was wondering whether if SRU had a new parameter "queryn" say which gave an integer number of query search clauses across which the query was fragmented then the query could be simply recomposed in a predetermined fashion.

E.g. if one had something like:

&queryn=2
&q1.idx=index1
&q1.rel=relation1
&q1.trm=term1
&q1.bln=boolean1
&q2.idx=index2
&q2.rel=relation2
&q2.trm=term2
&q2.bln=boolean2

then the parameters could be sent direct from the form without any handling and composed on the server side by following a simple rule, i.e. concatenation of (known number of) search clause components with whitespace separators, and concatenation of search clauses with
(whitespaced) booleans. So, in above example with n=2 params it would be straightforward for a querystring builder to look for params "q1.*"
through "q2.*" and build the CQL query as

query = '';
for (i=1; i <= queryn; i++) {
    if (q{i}.trm) {
      query += q{i}.idx + ' ' + q{i}.rel + ' ' + q{i}.trm;
    }
    if (i < queryn) { query += ' ' + q{i}.bln + ' '; }
}

i.e.

query = q1.idx + ' ' + q1.rel + ' ' + q1.trm + ' ' + q1.bln + ' ' + q2.idx + ' ' + q2.rel + ' ' + q2.trm

As long as a form laid out query components in a defined (numbered) fashion and then declared the total number of search clauses then the query builder just needs to iterate over the known number of search clauses.

Alternately the query could be assembled on the client using JavaScript such as the "mungeForm" function we have on nature.com OpenSearch via the oclcsrw package. And if a client had disabled JavaScript then the server itself could detect the "queryn" parameter and reassemble the query. Of course this really means that

searchRetrieve = query | queryn (=> query = q1.* + q2.* + ...)

Such an extension to SRU could certainly provide ample support for simple forms - such as most in practice invariably are - without requiring special JavaScript or bespoke handling. Of course, it is very limiting in terms of query expressivity although it does map reasonably well to standard form inputs.

What do you think? Interested to hear your feedback on this general approach to (re)assembling CQL queries.

Thanks,

Tony

ps/
In an earlier attempt I had considered just breaking a CQL query into an arbitrary number of string fragments which could be resequenced into a complete CQL string but ran into a problem concerning empty terms which would break the validity of the CQL. Hence this revised approach which is more of a matrix method with index, relation, term (and boolean) correlated and identified by row order.

************************************************************************
********
DISCLAIMER: This e-mail is confidential and should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage mechanism. Neither Macmillan Publishers Limited nor any of its agents accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Macmillan Publishers Limited or one of its agents.
Please note that neither Macmillan Publishers Limited nor any of its agents accept any responsibility for viruses that may be contained in this e-mail or its attachments and it is your responsibility to scan the e-mail and attachments (if any). No contracts may be concluded on behalf of Macmillan Publishers Limited or its agents by means of e-mail communication.
Macmillan
Publishers Limited Registered in England and Wales with registered number 785998 Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS
************************************************************************
********

****************************************************************************
****
DISCLAIMER: This e-mail is confidential and should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage mechanism. Neither Macmillan Publishers Limited nor any of its agents accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Macmillan Publishers Limited or one of its agents.
Please note that neither Macmillan Publishers Limited nor any of its agents accept any responsibility for viruses that may be contained in this e-mail or its attachments and it is your responsibility to scan the e-mail and attachments (if any). No contracts may be concluded on behalf of Macmillan Publishers Limited or its agents by means of e-mail communication. Macmillan

Publishers Limited Registered in England and Wales with registered number
785998
Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS
****************************************************************************
****

________________________________

No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1170 / Virus Database: 426/3312 - Release Date: 12/12/10

Follow-Ups:
- RE: [search-ws] queryn: A proposal for SRU to facilitate formsprocessing
  - From: Matthew Dovey <m.dovey@jisc.ac.uk>

References:
- error in sort parameter
  - From: "Denenberg, Ray" <rden@loc.gov>
- RE: [search-ws] error in sort parameter
  - From: "Hammond, Tony" <t.hammond@nature.com>
- Facet limit
  - From: "Ray Denenberg, Library of Congress" <rden@loc.gov>
- RE: Facet limit
  - From: "Hammond, Tony" <t.hammond@nature.com>
- RE: Facet limit
  - From: "Ray Denenberg, Library of Congress" <rden@loc.gov>
- queryn: A proposal for SRU to facilitate forms processing
  - From: "Hammond, Tony" <t.hammond@nature.com>
- RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing
  - From: "LeVan,Ralph" <levan@oclc.org>
- RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing
  - From: "Ray Denenberg, Library of Congress" <rden@loc.gov>
- RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing
  - From: "Ray Denenberg, Library of Congress" <rden@loc.gov>
- RE: [search-ws] queryn: A proposal for SRU to facilitate forms processing
  - From: "Hammond, Tony" <t.hammond@nature.com>
- RE: [search-ws] queryn: A proposal for SRU to facilitate formsprocessing
  - From: Matthew Dovey <m.dovey@jisc.ac.uk>