regrep-query message

Subject: Re: [regrep-query] Iteration Support for Queries v0.2
From: Len Gallagher <lgallagher@nist.gov>
To: "regrep-query@lists.oasis-open.org" <regrep-query@lists.oasis-open.org>
Date: Fri, 26 Jul 2002 16:04:32 -0400

Query Subteam,

I did a quick read of the "Iteration support for Queries" proposal by 
Nikola and Farrukh and have a couple of issues for the team to consider. I 
support the authors in their attempt to have a simple facility, but the 
simplicity does yield some ambiguity that raises the issues.

Suppose Q is a query that is fixed for the duration of an interaction with 
the Registry.

Suppose the iteration index varies from 0,1,2,... and the max number of 
objects to be returned is N. Assume the Client does not change the value of 
N during its sequence of requests. A typical sequence of requests might be 
as follows:

R1:  submit (Q,0,N) returns objects 1-to-N
R2:  submit (Q,1,N) returns objects N+1 to 2N
R3:  submit (Q,2,N) returns objects 2N+1 to 3N
etc.

The Client might continue to make these requests until some indexed result 
set is returned with less than N objects in it.

The ambiguity arises because each request Ri is treated as a separate 
submission and thus a separate transaction, so there is no guarantee that 
the result set for Q will be the same for each request. Some objects may 
have been deleted between requests so there is no guarantee that some 
relevant objects are not skipped over on subsequent requests. Some objects 
may have been added between requests so there is no guarantee that other 
objects won't be returned multiple times. Since there is no requirement 
that the Registry remember what it has just done in a previous transaction, 
it may construct a different execution plan for each request and order the 
results in a different manner (I think this is a real possibility in 
complex requests!).

Should a Registry be required to lock all objects in the Result set against 
Update while a Client is casually browsing through it?

The only way to avoid these kinds of anomalies is to require that the 
Registry treat a sequence of requests like the above as a single 
transaction, or execute Q only once and hold the "complete" result set (not 
just ID's) for some indeterminate amount of time. But there is no 
requirement that either of these be done, and both are expensive as far as 
the Registry is concerned.

Most users most of the time will be willing to trade a little ambiguity for 
speedy results that are "almost" correct. I suspect that many search 
engines operate by making that assumption. But will all Clients be that 
forgiving? Should there be options to allow the client to specify what it 
expects?

Issue #1: Should the specification contain a note saying that these kinds 
of ambiguities are possible and that a Client cannot rely on getting a 
complete and consistent collection of objects if they retrieve the 
collection using this kind of iteration? Different conforming registries 
may address the potential ambiguities in different ways thereby giving 
slightly different results.

Issue #2: Should the specification require that the Query be submitted 
separately from the iterations? This is logically cleaner, but doesn't 
really avoid the ambiguities unless additional requirements are placed on 
the Registry to implement transactional semantics or to hold large 
"complete" result sets for indefinite time. The SQL notion of Declare 
Cursor and Fetch from Cursor use this approach and have additional 
INSENSITIVE and SCROLL options on DECLARE CURSOR that a client can use to 
tell the database server whether certain ambiguities must be avoided during 
subsequent Fetches. ODBC and JDBC also use this approach with a clear 
separation between the Query, its result set (or sets), and cursor operations.

Issue #3: Should a Client be able to specify some ordering criteria on a 
result set in order to minimize (but not eliminate) the ambiguities 
discussed above? Should there be a default ordering based on GUID's?

Issue #4 (Minor): Is there a reason why startIndex begins with zero instead 
of one? I suspect that most non-programmers will find it more natural to 
begin counting with 1. If it makes any difference the Cursor notion in SQL 
begins counting with 1, but it is numbering rows, not numbering different 
partial result sets.

I don't have good answers to these issues. If there is no need for 
follow-on efforts to support Client options similar to those discussed in 
Issue #2, then a simple warning like in Issue #1 may be sufficient.

Regards,
Len


At 01:13 PM 7/26/2002 -0400, Farrukh Najmi wrote:
>Team,
>
>Attached is the initial proposal for "Iteration Support for Queries"
>work item for V3. Thanks to Nikola for provding initial review on
>version 0.1.
>
>Please review it and provide feedback on the proposal. Ideally, I would
>like to address any sub-team feedback before submitting the proposal to
>the general TC. Noet that some broken links in proposal will get
>resolved when merged into ebRS doc.
>
>--
>Regards,
>Farrukh
>
Follow-Ups:
- Re: [regrep-query] Iteration Support for Queries v0.2
  - From: Farrukh Najmi <Farrukh.Najmi@Sun.COM>
References:
- [regrep-query] Iteration Support for Queries v0.2
  - From: Farrukh Najmi <Farrukh.Najmi@Sun.COM>