[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: [regrep-query] Iteration Support for Queries v0.2
Query Subteam, I did a quick read of the "Iteration support for Queries" proposal by Nikola and Farrukh and have a couple of issues for the team to consider. I support the authors in their attempt to have a simple facility, but the simplicity does yield some ambiguity that raises the issues. Suppose Q is a query that is fixed for the duration of an interaction with the Registry. Suppose the iteration index varies from 0,1,2,... and the max number of objects to be returned is N. Assume the Client does not change the value of N during its sequence of requests. A typical sequence of requests might be as follows: R1: submit (Q,0,N) returns objects 1-to-N R2: submit (Q,1,N) returns objects N+1 to 2N R3: submit (Q,2,N) returns objects 2N+1 to 3N etc. The Client might continue to make these requests until some indexed result set is returned with less than N objects in it. The ambiguity arises because each request Ri is treated as a separate submission and thus a separate transaction, so there is no guarantee that the result set for Q will be the same for each request. Some objects may have been deleted between requests so there is no guarantee that some relevant objects are not skipped over on subsequent requests. Some objects may have been added between requests so there is no guarantee that other objects won't be returned multiple times. Since there is no requirement that the Registry remember what it has just done in a previous transaction, it may construct a different execution plan for each request and order the results in a different manner (I think this is a real possibility in complex requests!). Should a Registry be required to lock all objects in the Result set against Update while a Client is casually browsing through it? The only way to avoid these kinds of anomalies is to require that the Registry treat a sequence of requests like the above as a single transaction, or execute Q only once and hold the "complete" result set (not just ID's) for some indeterminate amount of time. But there is no requirement that either of these be done, and both are expensive as far as the Registry is concerned. Most users most of the time will be willing to trade a little ambiguity for speedy results that are "almost" correct. I suspect that many search engines operate by making that assumption. But will all Clients be that forgiving? Should there be options to allow the client to specify what it expects? Issue #1: Should the specification contain a note saying that these kinds of ambiguities are possible and that a Client cannot rely on getting a complete and consistent collection of objects if they retrieve the collection using this kind of iteration? Different conforming registries may address the potential ambiguities in different ways thereby giving slightly different results. Issue #2: Should the specification require that the Query be submitted separately from the iterations? This is logically cleaner, but doesn't really avoid the ambiguities unless additional requirements are placed on the Registry to implement transactional semantics or to hold large "complete" result sets for indefinite time. The SQL notion of Declare Cursor and Fetch from Cursor use this approach and have additional INSENSITIVE and SCROLL options on DECLARE CURSOR that a client can use to tell the database server whether certain ambiguities must be avoided during subsequent Fetches. ODBC and JDBC also use this approach with a clear separation between the Query, its result set (or sets), and cursor operations. Issue #3: Should a Client be able to specify some ordering criteria on a result set in order to minimize (but not eliminate) the ambiguities discussed above? Should there be a default ordering based on GUID's? Issue #4 (Minor): Is there a reason why startIndex begins with zero instead of one? I suspect that most non-programmers will find it more natural to begin counting with 1. If it makes any difference the Cursor notion in SQL begins counting with 1, but it is numbering rows, not numbering different partial result sets. I don't have good answers to these issues. If there is no need for follow-on efforts to support Client options similar to those discussed in Issue #2, then a simple warning like in Issue #1 may be sufficient. Regards, Len At 01:13 PM 7/26/2002 -0400, Farrukh Najmi wrote: >Team, > >Attached is the initial proposal for "Iteration Support for Queries" >work item for V3. Thanks to Nikola for provding initial review on >version 0.1. > >Please review it and provide feedback on the proposal. Ideally, I would >like to address any sub-team feedback before submitting the proposal to >the general TC. Noet that some broken links in proposal will get >resolved when merged into ebRS doc. > >-- >Regards, >Farrukh >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC