search-ws message

Subject: 2.0 features (draft message to SRU implementors)

From: "Ray Denenberg, Library of Congress" <rden@loc.gov>
To: <search-ws@lists.oasis-open.org>
Date: Tue, 26 Aug 2008 11:47:17 -0400

Following is a draft message that I propose to post to the SRU list.   This
is for discussion at tomorrow's call. We should discuss what else to include
in the message and/or other messages (should there be multiple messages  -
to avoid overload of a given message).


******* DRAFT MESSAGE ***************

SRU Implementors:

The OASIS Search Web Services Technical Committee invites your participation
in the development of SRU/CQL 2.0.

We have begun accumulating suggestions for 2.0 features. Additional
suggestions are welcome.  We are also currently gathering requirements for
geospatial  and LOM applications.

 Among the suggestions are:

1. Allow Non-XML Record Representations
Many formats do not map easily into XML, for example multimedia, images, and
even complex text formats. Allow non-xml serialized data in the response, or
value by reference. These would be signaled by additional values for the
recordPacking parameter. For example   recordPacking="base64" or
recordPacking="uri"

2. Proximity
deprecate the PROX BOOLEAN operator and instead represent proximity by
adding a relation, 'window'.  examples:
· dc.title window/distance<5/unit=word "fries salt vinegar"
 (fries, salt, and vinegar all within a span of 5 words)
·dc.title window/distance<5/unit=word ((fish and fries) and (salt or
vinegar))
(fish and chips and one of salt or vinegar, in a 5 word window)
· dc.title window/distance=2/unit=word/ordered "fries salt "
(fries followed by salt with 2 words between)

3. Faceted Searching ("scan" a result set)
One might search a library database for books about a particular topic, and
then see how many records there are in different time period

4. Result Set Size
Allow the client to indicate how much effort the server should take to
determine or estimate the number of records in the result set. Similarly,
allow the response to estimate accuracy of  the result-set-size reported.
The server may be able to determine the exact number of records, or provide
a realistic estimate, but it may be an expensive process. The server might
prefer not go through that process unless the client requests that it do so.
Or the client might want to explicitly request that the server go through,
or not go through, that process. (The client might want the first 10
records, or any 10 records, regardless of how many records there are. In
that case if the server goes through the process of determining how many
records there are, it may go through an expensive process for nothing. There
is also the special case where the server cannot determine or estimate the
number of records in the result set. In that case it might be useful to have
a special value or some way to indicate this condition.)

5. Multiple Query Types
CQL is currently the only query type used by SRU but there could be other
query types as well, for example, Parameterized Query and XQuery.

6.  Eliminate the Version and Operation Parameters
These two parameters  are based on the assumption that the same base URL
might be used for different operations and versions. Instead, different base
URLs should be used.

Follow-Ups:
- RE: [search-ws] 2.0 features (draft message to SRU implementors)
  - From: "LeVan,Ralph" <levan@oclc.org>