OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

provision message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")


Anil, after reflecting on this briefly, one thing may be misleading  
about the way I ranked schemes.  See inline below.

On Apr 5, 2011, at 9:38 AM, Gary Cole wrote:

> Anil, I think I can net this out based on my experience.  (I've done  
> this many times with many different back-ends.)
>
> 1) The biggest simplification you can make is constraining logical  
> operators to 'AND' alone.
> - 'NOT' and 'OR' can be both complex and inefficient.
> - While people often think that they want 'NOT', they usually need  
> only a NOT_EQUAL (NE or !=) matching-operator.
> - Most of the complexity and processing burden lies in evaluating  
> arbitrarily nested clauses.
> - If you support only 'AND', then there's no need for a client to  
> specify a logical operator,
>   which simplifies both syntax and semantics.
>
> RECOMMENDATIONS:
> 1A) If you can get by with 'AND' alone, then by all means do so!   
> This saves you the most.
> 1B) Next best approach is to support INNER-ANDS and one level of  
> OUTER-ORS.
>       -- Any nesting of ANDs and ORs can be put into this form.
>       -- DBMS do this wherever possible in order to optimize  
> evaluation.
>       -- Technically, client could get the same result by issuing a  
> separate query for each ORed set of clauses.
> 1C) Add a NE operator ("!=" or "<>") rather than supporting a  
> logical operator 'NOT'.
>
> 2) As far as matching-operators:
> - EQUALS is most necessary and most efficient.
> - GT, GTE, LT, LTE are often helpful and are usually efficient.
> - STARTS_WITH is very commonly desired for string-valued attributes  
> (and is efficient).
> - CONTAINS and ENDS_WITH are sometimes desired, but are usually  
> inefficient.
>
> RECOMMENDATIONS:
> 2A) If you can get by with 'EQUALS' alone, then do so. Otherwise,  
> specify GT, GTE, LT, LTE (and perhaps NE).
> 2B) Don't require CONTAINS or END_WITH unless you truly need them.
> 2C) May not need STARTS_WITH; a GTE will do roughly the same thing  
> for string values.
>
> 3) With EQUALS, GT, GTE, LT, LTE, you must decide whether comparison  
> is always LEXICAL or is ARITHMETIC for numeric values:
> - Lexically, "15" < "5"
> - Arithmetically, 15 > 5.
> Reviewing the approaches I've seen:
> 3A) Comparison is arithmetic when comparing a specified value to an  
> attribute with a numeric syntax.
> 3B) Comparison is always lexical (i.e., we treat everything as a  
> string).
> 3C) Define separate operators for lexical and arithmetic comparisons  
> and validate
>       (e.g., throw errors when an arithmetic operator is applied to  
> a string-valued attribute).
>
> Collecting these into schemes ranked by levels of simplicity (which  
> is a new thought exercise for me):
> 0. One attribute per query (no AND) and EQUALS is the only matching  
> operator.
> 1. AND and EQUALS only. (AND is implicit).
> 2. AND and EQUALS only. (AND is explicit).
> 2. AND, EQUALS, GT, GTE, LT, LTE.
> 3. AND, EQUALS, GT, GTE, LT, LTE, NE.
> 4. AND, EQUALS, GT, GTE, LT, LTE, NE, STARTS_WITH.
> 5. AND, EQUALS, GT, GTE, LT, LTE, NE, STARTS_WITH, END_WITH, CONTAINS.


> 10. INNER ANDS and one level of OUTER ORs (plus all matching  
> operators)

Adding support for OUTER ORs increases complexity by at least one  
order of magnitude.

> 100. Nestable AND, OR and NOT (plus all matching operators).

Adding support for arbitrarily nested ANDs, ORs and NOTs increases  
complexity by at least two orders of magnitude (and by more if the  
back-end does not natively support such constructs).

>
> Details below after my signature.
>
> Gary
>
> <Details>
>
> Supporting multiple, arbitrarily nested logical operators is complex  
> and very expensive:
> - Client must build (and must structure properly) a more complex  
> request
> - Server must parse (and must evaluate properly) a more complex  
> request
> - Implementing NOT on some back-ends (or for nested clauses)
>   is very complex for (and imposes an inordinate processing burden  
> on) the server.
> - Implementing OR on some back-ends is complex and imposes a  
> significant processing burden on the server.
> - Clients can send inefficiently-structured queries, which may tempt  
> the server to optimize queries, adding more complexity.
>
> Operators for matching deserve some thought:
> - EQ is always necessary and is simple semantically once you specify  
> case-sensitivity (in this case, case-insensitive).
>    ** Must specify case-sensitivity **
> - GT is very helpful, especially for ordering results or retrieving  
> in chunks.
>    ** Must specify whether comparison is lexical and when (if ever)  
> comparison is arithmetic. **
> - LT is used less-often than GT, but if you're supporting GT doesn't  
> add much difficulty.
> - GTE is turns out to be helpful when ordering results or retrieving  
> in chunks.
>    (Again, if you're supporting GT, GTE doesn't add much effort.)
> - LTE is the same.  If you're supporting GT/GTE, might as well  
> support LTE.
> - 'STARTS_WITH' is very commonly used with strings.  Implementations  
> are generally efficient.
> - 'CONTAINS' is next-most-frequently-requested, but implementation  
> is usually inefficient.
> - 'ENDS_WITH' is sometimes requested, but implementation is almost  
> as bad as CONTAINS.
>
> </Details>
>
>
> On Apr 5, 2011, at 6:32 AM, John, Anil wrote:
>
>> A correction to my orignal e-mail below:
>>
>> "...authoritative sources of data that have existing processes in  
>> place for Attribute Management and as such Updates/Deletes etc are  
>> *NOT* permitted" [via the SPML interface]
>>
>> Yes, this would be the minima needed to support conformance to the  
>> profile. If you support more, that would be a good differentiator  
>> for the implementation.
>>
>> As to support for logical operators beyond 'AND' and the set of  
>> matching-operators, this is where I need a bit of help.. Ideally I  
>> would like to have support for AND/OR/NOT combined with  (=)/(>)/ 
>> (<)/(>=)/(<=) applied to a specific subset of case-insensitive  
>> attributes, but I don't have a sense of how expensive the  
>> operations are to implement. Would appreciate some feedback on that  
>> point.
>>
>> Regards,
>>
>> - Anil
>>
>> ________________________________________
>> From: Gary Cole [gary.cole@oracle.com]
>> Sent: Monday, April 04, 2011 6:03 PM
>> To: John, Anil
>> Cc: Smith, Thomas C.; OASIS PSTC
>> Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
>>
>> Thanks!  I think I begin to understand.  Let me play it back to you  
>> to be sure I have it right.  The goal is to simplify search() and  
>> constrain it so that, rather than being open-ended, search becomes  
>> simpler to implement and to test.  The first dimension to constrain  
>> is that of supported logical operators: AND would be the only  
>> operator.  Another dimension to constrain would be the set of  
>> queryable attributes: here a provider would need a way to advertise  
>> the subset of attributes on which it supports search.
>>
>> Do I have that right?  If so, let me ask a few more questions.
>>
>> You didn't mention constraining the set of matching-operators;  
>> would you want to limit/require certain of these?  I assume that  
>> you'd want equals (=).  Would you also want greater-than (>), less- 
>> than (<), greater-than-or-equal-to (>=), less-than-or-equal-to  
>> (<=)?  What about startsWith, endsWith, contains?  Could you get by  
>> with just equals and startsWith?  (The operators endsWith and  
>> contains tend to be rather expensive and inefficient.)  For  
>> matching-operators on alphabetic values, there's the further issue  
>> of case-sensitivity.  Would you be prefer to specify case- 
>> insensitive matching or case-sensitive matching?
>>
>> Would this be specifying (in effect) minima?  That is, you would  
>> not object if a provider supported more in the way of search than  
>> you require, as long as the specified behavior is supported in a  
>> standard way, right?
>>
>> On Apr 4, 2011, at 9:01 AM, John, Anil wrote:
>>
>> My perspective on this is being driven by the need to implement an  
>> batch/occasionally-connected interface to an Attribute Provider  
>> (AP) that uses SPML as the interface specification. The motivator  
>> for the “Read Only” portion is that AP is fronting authoritative  
>> sources of data that have existing processes in place for Attribute  
>> Management and as such Updates/Deletes etc are permitted.
>>
>> The assumption in this case is that  the “Attribute Contract” that  
>> is supported by the AP is known and fixed.. i.e. There is a finite  
>> set of attributes that are exposed via this interface and are  
>> advertised via the listTargets operations
>>
>> At the same time, one of the items that came out of the Burton  
>> Group discussions around SPML was that implementing a provider that  
>> that supported all permutations of the ‘and’ or ‘or’ and ‘not’  
>> operators combined with all attributes was non-trivial which seems  
>> to have ended with little to no support and *no way to verify what  
>> support existed* in individual products.
>>
>> So, what I’d like to see in this read only profile is a way to  
>> provide a mechanism that limits Search using a combination of  
>> specific operators and attributes. i.e. I will allow queries that  
>> allow only specific operators combined with specific clauses.  E.g.  
>> Allow only ‘and’ operations on attributes X, Y and Z.
>>
>> The two use cases that are expected to be enabled by this are:
>>
>> 1)      Ability to query the AP to retrieve attributes of multiple  
>> users all in one shot, potentially for provisioning use cases
>> 2)      Ability to do a one way synch from the AP to a local  
>> system.. i.e. The AP will always be the master system that will  
>> overwrite the local store
>>
>> The key here is to make sure that the profile itself is constrained  
>> enough that it is implementable and testable.
>>
>> I am not sure if I answered your question to the level of detail  
>> you are looking for but I am hoping that there is general interest  
>> in such a capability.
>>
>> Regards,
>>
>> -        Anil
>>
>>
>>
>> From: Gary Cole [mailto:gary.cole@oracle.com]
>> Sent: Tuesday, March 29, 2011 2:44 PM
>> To: John, Anil
>> Cc: Smith, Thomas C.; OASIS PSTC
>> Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
>>
>> In what ways would you constrain search?  Would these constraints  
>> be minima, maxima or both?
>>
>> A provider constrains the set of object-classes for which the  
>> provider supports search in the listTargetsResponse.
>>
>> Search as specified in SPMLv2 is at heart a subset of LDAP search:
>> - scope: 'pso', 'oneLevel' or 'subTree'
>> - operators: 'and', 'or', and 'not'
>> - clauses: dependent on profile or provider, but examples show  
>> <attribute-name> = <attribute-value>.
>>
>> The  DSML profile looks especially LDAP-like, since DSML is  
>> basically LDAP wrapped in a bunch of XML tags.  Nothing in the  
>> specification of the base protocol nor the DSML profile requires a  
>> provider to support fancier search than it wishes to provide.  The  
>> provider simply returns an error if:
>>
>> •     The provider cannot evaluate an instance of {QueryClauseType}  
>> that the instance of {SearchQueryType} contains.
>>
>> •     The open content of the instance of {SearchQueryType} is too  
>> complex for the provider to evaluate.
>>
>> In short, it's entirely up to the provider how fancy to get with  
>> search().  We figured that market-pressures would incent each  
>> implementer to support search appropriately in its provider.  For  
>> example, SIM supported search on every type of object.  OIM 9.x  
>> supports DSML search for users.
>>
>> Gary
>>
>> On Mar 29, 2011, at 8:39 AM, John, Anil wrote:
>>
>>
>> Gary,
>>
>>
>> Search gives you by default the equivalent of a batch lookup.
>>
>> So it does, and a constrained-search profile would meet the  
>> functionality we are looking for (based on your description  
>> below).  Our perspective was shaped in a lot of ways by the  
>> reluctance of product implementations to implement anything beyond  
>> the basic operations on a SPML provider.  Search is an optional  
>> capability (and so is batch, but thought that it would be easier to  
>> make a case for it).
>>
>> I would be interested to get the perspective of folks who are  
>> product implementors to see what would be "easier" to implement for  
>> them going forward. At the end of the road, we are looking for  
>> something that will exist in real-life within products and not just  
>> as shelf-ware.
>>
>> Regards,
>>
>> - Anil
>>
>> ________________________________________
>> From: Gary Cole [gary.cole@oracle.com<mailto:gary.cole@oracle.com>]
>> Sent: Tuesday, March 29, 2011 9:20 AM
>> To: John, Anil
>> Cc: Smith, Thomas C.; OASIS PSTC
>> Subject: Batch Lookup (was "Re: ReadOnlyProfile")
>>
>> Anil,
>>
>> On Mar 29, 2011, at 7:43 AM, John, Anil wrote:
>>
>> We also need to re-read the specs to see if there is overlap between
>> lookup() and search on what we need to accomplish.
>>
>> Remind me again please what you need to accomplish.  I may be able to
>> help.
>>
>> For instance, I may be able to clarify something about your
>> requirements for "SPML Operations on an Attribute Service".  You
>> originally thought that you needed "batch pull" capabilities because
>> SAML Attribute Query could not answer the following questions:
>> * "Give me the unique id's of all users with Attribute X"
>> * "For all users (whose unique id's I just got), give me listing of
>> attributes for each (in one shot)"
>>
>> SPML's Search Capability (section 3.6.7.1 of the main spec) gives you
>> all of that in one shot.  You can request one search() and in that
>> request use the 'returnData' attribute to specify how much  
>> information
>> you want back for each matching object:  nothing, identifier-only,
>> data (which would include all schema-defined attributes) or
>> everything, which would add capability-specific data to schema- 
>> defined
>> data.  Another parameter allows you to specify which capabilities
>> interest you.  In your case, you would specify "returnData='data'",  
>> so
>> that you would get all of the attributes.  Or you could take the
>> default, which is 'everything'.  Unless you have capability-specific
>> data, 'everything' is equivalent to 'data'.  A client can also  
>> specify
>> a maximum limit on the number of matching objects to return.
>>
>> The Provider may send all of the matching objects in a single
>> SearchResult, or the provider may break the results into chunks that
>> the requestor can iterate.  Logically, it's still part of a single
>> search result, although a series of iterate() requests may be
>> necessary to return all matching objects.
>>
>> So, please help me to understand what a batch operation would add to
>> this?  Search gives you by default the equivalent of a batch lookup.
>>
>> Gary
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe from this mail list, you must leave the OASIS TC that
>> generates this mail.  Follow this link to all your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/ 
>> my_workgroups.php
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe from this mail list, you must leave the OASIS TC that
>> generates this mail.  Follow this link to all your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/ 
>> my_workgroups.php
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]