provision message

Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
From: Gary Cole <gary.cole@oracle.com>
To: Tom Zeller <tzeller@internet2.edu>
Date: Tue, 5 Apr 2011 13:35:58 -0500
Rather than wildcards, I suggested STARTS_WITH, ENDS_WITH and  
CONTAINS.  Similar effect, but without the confusion and arguments  
over which wildcard character means what, and without the arguments of  
how far short to stop of patterns or regular expressions. :-)

On Apr 5, 2011, at 1:29 PM, Tom Zeller wrote:

> Unless I misread, I do not think you specified wildcards or "any"
> value, e.g. "*" in an ldap search filter.
>
> Where would wildcards fit into your search schemes ?
>
> There have been a couple of conversations in HE regarding a simple and
> generic search syntax, so this conversation is quite timely.
>
> On Tue, Apr 5, 2011 at 10:09 AM, Gary Cole <gary.cole@oracle.com>  
> wrote:
>> Anil, after reflecting on this briefly, one thing may be misleading  
>> about
>> the way I ranked schemes.  See inline below.
>>
>> On Apr 5, 2011, at 9:38 AM, Gary Cole wrote:
>>
>>> Anil, I think I can net this out based on my experience.  (I've  
>>> done this
>>> many times with many different back-ends.)
>>>
>>> 1) The biggest simplification you can make is constraining logical
>>> operators to 'AND' alone.
>>> - 'NOT' and 'OR' can be both complex and inefficient.
>>> - While people often think that they want 'NOT', they usually need  
>>> only a
>>> NOT_EQUAL (NE or !=) matching-operator.
>>> - Most of the complexity and processing burden lies in evaluating
>>> arbitrarily nested clauses.
>>> - If you support only 'AND', then there's no need for a client to  
>>> specify
>>> a logical operator,
>>>  which simplifies both syntax and semantics.
>>>
>>> RECOMMENDATIONS:
>>> 1A) If you can get by with 'AND' alone, then by all means do so!   
>>> This
>>> saves you the most.
>>> 1B) Next best approach is to support INNER-ANDS and one level of
>>> OUTER-ORS.
>>>      -- Any nesting of ANDs and ORs can be put into this form.
>>>      -- DBMS do this wherever possible in order to optimize  
>>> evaluation.
>>>      -- Technically, client could get the same result by issuing a
>>> separate query for each ORed set of clauses.
>>> 1C) Add a NE operator ("!=" or "<>") rather than supporting a  
>>> logical
>>> operator 'NOT'.
>>>
>>> 2) As far as matching-operators:
>>> - EQUALS is most necessary and most efficient.
>>> - GT, GTE, LT, LTE are often helpful and are usually efficient.
>>> - STARTS_WITH is very commonly desired for string-valued  
>>> attributes (and
>>> is efficient).
>>> - CONTAINS and ENDS_WITH are sometimes desired, but are usually
>>> inefficient.
>>>
>>> RECOMMENDATIONS:
>>> 2A) If you can get by with 'EQUALS' alone, then do so. Otherwise,  
>>> specify
>>> GT, GTE, LT, LTE (and perhaps NE).
>>> 2B) Don't require CONTAINS or END_WITH unless you truly need them.
>>> 2C) May not need STARTS_WITH; a GTE will do roughly the same thing  
>>> for
>>> string values.
>>>
>>> 3) With EQUALS, GT, GTE, LT, LTE, you must decide whether  
>>> comparison is
>>> always LEXICAL or is ARITHMETIC for numeric values:
>>> - Lexically, "15" < "5"
>>> - Arithmetically, 15 > 5.
>>> Reviewing the approaches I've seen:
>>> 3A) Comparison is arithmetic when comparing a specified value to an
>>> attribute with a numeric syntax.
>>> 3B) Comparison is always lexical (i.e., we treat everything as a  
>>> string).
>>> 3C) Define separate operators for lexical and arithmetic  
>>> comparisons and
>>> validate
>>>      (e.g., throw errors when an arithmetic operator is applied to a
>>> string-valued attribute).
>>>
>>> Collecting these into schemes ranked by levels of simplicity  
>>> (which is a
>>> new thought exercise for me):
>>> 0. One attribute per query (no AND) and EQUALS is the only matching
>>> operator.
>>> 1. AND and EQUALS only. (AND is implicit).
>>> 2. AND and EQUALS only. (AND is explicit).
>>> 2. AND, EQUALS, GT, GTE, LT, LTE.
>>> 3. AND, EQUALS, GT, GTE, LT, LTE, NE.
>>> 4. AND, EQUALS, GT, GTE, LT, LTE, NE, STARTS_WITH.
>>> 5. AND, EQUALS, GT, GTE, LT, LTE, NE, STARTS_WITH, END_WITH,  
>>> CONTAINS.
>>
>>
>>> 10. INNER ANDS and one level of OUTER ORs (plus all matching  
>>> operators)
>>
>> Adding support for OUTER ORs increases complexity by at least one  
>> order of
>> magnitude.
>>
>>> 100. Nestable AND, OR and NOT (plus all matching operators).
>>
>> Adding support for arbitrarily nested ANDs, ORs and NOTs increases
>> complexity by at least two orders of magnitude (and by more if the  
>> back-end
>> does not natively support such constructs).
>>
>>>
>>> Details below after my signature.
>>>
>>> Gary
>>>
>>> <Details>
>>>
>>> Supporting multiple, arbitrarily nested logical operators is  
>>> complex and
>>> very expensive:
>>> - Client must build (and must structure properly) a more complex  
>>> request
>>> - Server must parse (and must evaluate properly) a more complex  
>>> request
>>> - Implementing NOT on some back-ends (or for nested clauses)
>>>  is very complex for (and imposes an inordinate processing burden  
>>> on) the
>>> server.
>>> - Implementing OR on some back-ends is complex and imposes a  
>>> significant
>>> processing burden on the server.
>>> - Clients can send inefficiently-structured queries, which may  
>>> tempt the
>>> server to optimize queries, adding more complexity.
>>>
>>> Operators for matching deserve some thought:
>>> - EQ is always necessary and is simple semantically once you specify
>>> case-sensitivity (in this case, case-insensitive).
>>>   ** Must specify case-sensitivity **
>>> - GT is very helpful, especially for ordering results or  
>>> retrieving in
>>> chunks.
>>>   ** Must specify whether comparison is lexical and when (if ever)
>>> comparison is arithmetic. **
>>> - LT is used less-often than GT, but if you're supporting GT  
>>> doesn't add
>>> much difficulty.
>>> - GTE is turns out to be helpful when ordering results or  
>>> retrieving in
>>> chunks.
>>>   (Again, if you're supporting GT, GTE doesn't add much effort.)
>>> - LTE is the same.  If you're supporting GT/GTE, might as well  
>>> support
>>> LTE.
>>> - 'STARTS_WITH' is very commonly used with strings.   
>>> Implementations are
>>> generally efficient.
>>> - 'CONTAINS' is next-most-frequently-requested, but implementation  
>>> is
>>> usually inefficient.
>>> - 'ENDS_WITH' is sometimes requested, but implementation is almost  
>>> as bad
>>> as CONTAINS.
>>>
>>> </Details>
>>>
>>>
>>> On Apr 5, 2011, at 6:32 AM, John, Anil wrote:
>>>
>>>> A correction to my orignal e-mail below:
>>>>
>>>> "...authoritative sources of data that have existing processes in  
>>>> place
>>>> for Attribute Management and as such Updates/Deletes etc are *NOT*
>>>> permitted" [via the SPML interface]
>>>>
>>>> Yes, this would be the minima needed to support conformance to the
>>>> profile. If you support more, that would be a good differentiator  
>>>> for the
>>>> implementation.
>>>>
>>>> As to support for logical operators beyond 'AND' and the set of
>>>> matching-operators, this is where I need a bit of help.. Ideally  
>>>> I would
>>>> like to have support for AND/OR/NOT combined with  (=)/(>)/(<)/ 
>>>> (>=)/(<=)
>>>> applied to a specific subset of case-insensitive attributes, but  
>>>> I don't
>>>> have a sense of how expensive the operations are to implement.  
>>>> Would
>>>> appreciate some feedback on that point.
>>>>
>>>> Regards,
>>>>
>>>> - Anil
>>>>
>>>> ________________________________________
>>>> From: Gary Cole [gary.cole@oracle.com]
>>>> Sent: Monday, April 04, 2011 6:03 PM
>>>> To: John, Anil
>>>> Cc: Smith, Thomas C.; OASIS PSTC
>>>> Subject: Re: [provision] RE: Batch Lookup (was "Re:  
>>>> ReadOnlyProfile")
>>>>
>>>> Thanks!  I think I begin to understand.  Let me play it back to  
>>>> you to be
>>>> sure I have it right.  The goal is to simplify search() and  
>>>> constrain it so
>>>> that, rather than being open-ended, search becomes simpler to  
>>>> implement and
>>>> to test.  The first dimension to constrain is that of supported  
>>>> logical
>>>> operators: AND would be the only operator.  Another dimension to  
>>>> constrain
>>>> would be the set of queryable attributes: here a provider would  
>>>> need a way
>>>> to advertise the subset of attributes on which it supports search.
>>>>
>>>> Do I have that right?  If so, let me ask a few more questions.
>>>>
>>>> You didn't mention constraining the set of matching-operators;  
>>>> would you
>>>> want to limit/require certain of these?  I assume that you'd want  
>>>> equals
>>>> (=).  Would you also want greater-than (>), less-than (<),
>>>> greater-than-or-equal-to (>=), less-than-or-equal-to (<=)?  What  
>>>> about
>>>> startsWith, endsWith, contains?  Could you get by with just  
>>>> equals and
>>>> startsWith?  (The operators endsWith and contains tend to be rather
>>>> expensive and inefficient.)  For matching-operators on alphabetic  
>>>> values,
>>>> there's the further issue of case-sensitivity.  Would you be  
>>>> prefer to
>>>> specify case-insensitive matching or case-sensitive matching?
>>>>
>>>> Would this be specifying (in effect) minima?  That is, you would  
>>>> not
>>>> object if a provider supported more in the way of search than you  
>>>> require,
>>>> as long as the specified behavior is supported in a standard way,  
>>>> right?
>>>>
>>>> On Apr 4, 2011, at 9:01 AM, John, Anil wrote:
>>>>
>>>> My perspective on this is being driven by the need to implement an
>>>> batch/occasionally-connected interface to an Attribute Provider  
>>>> (AP) that
>>>> uses SPML as the interface specification. The motivator for the  
>>>> “Read Only”
>>>> portion is that AP is fronting authoritative sources of data that  
>>>> have
>>>> existing processes in place for Attribute Management and as such
>>>> Updates/Deletes etc are permitted.
>>>>
>>>> The assumption in this case is that  the “Attribute Contract”  
>>>> that is
>>>> supported by the AP is known and fixed.. i.e. There is a finite  
>>>> set of
>>>> attributes that are exposed via this interface and are advertised  
>>>> via the
>>>> listTargets operations
>>>>
>>>> At the same time, one of the items that came out of the Burton  
>>>> Group
>>>> discussions around SPML was that implementing a provider that  
>>>> that supported
>>>> all permutations of the ‘and’ or ‘or’ and ‘not’ operators  
>>>> combined with all
>>>> attributes was non-trivial which seems to have ended with little  
>>>> to no
>>>> support and *no way to verify what support existed* in individual  
>>>> products.
>>>>
>>>> So, what I’d like to see in this read only profile is a way to  
>>>> provide a
>>>> mechanism that limits Search using a combination of specific  
>>>> operators and
>>>> attributes. i.e. I will allow queries that allow only specific  
>>>> operators
>>>> combined with specific clauses.  E.g. Allow only ‘and’ operations  
>>>> on
>>>> attributes X, Y and Z.
>>>>
>>>> The two use cases that are expected to be enabled by this are:
>>>>
>>>> 1)      Ability to query the AP to retrieve attributes of  
>>>> multiple users
>>>> all in one shot, potentially for provisioning use cases
>>>> 2)      Ability to do a one way synch from the AP to a local  
>>>> system..
>>>> i.e. The AP will always be the master system that will overwrite  
>>>> the local
>>>> store
>>>>
>>>> The key here is to make sure that the profile itself is constrained
>>>> enough that it is implementable and testable.
>>>>
>>>> I am not sure if I answered your question to the level of detail  
>>>> you are
>>>> looking for but I am hoping that there is general interest in  
>>>> such a
>>>> capability.
>>>>
>>>> Regards,
>>>>
>>>> -        Anil
>>>>
>>>>
>>>>
>>>> From: Gary Cole [mailto:gary.cole@oracle.com]
>>>> Sent: Tuesday, March 29, 2011 2:44 PM
>>>> To: John, Anil
>>>> Cc: Smith, Thomas C.; OASIS PSTC
>>>> Subject: Re: [provision] RE: Batch Lookup (was "Re:  
>>>> ReadOnlyProfile")
>>>>
>>>> In what ways would you constrain search?  Would these constraints  
>>>> be
>>>> minima, maxima or both?
>>>>
>>>> A provider constrains the set of object-classes for which the  
>>>> provider
>>>> supports search in the listTargetsResponse.
>>>>
>>>> Search as specified in SPMLv2 is at heart a subset of LDAP search:
>>>> - scope: 'pso', 'oneLevel' or 'subTree'
>>>> - operators: 'and', 'or', and 'not'
>>>> - clauses: dependent on profile or provider, but examples show
>>>> <attribute-name> = <attribute-value>.
>>>>
>>>> The  DSML profile looks especially LDAP-like, since DSML is  
>>>> basically
>>>> LDAP wrapped in a bunch of XML tags.  Nothing in the  
>>>> specification of the
>>>> base protocol nor the DSML profile requires a provider to support  
>>>> fancier
>>>> search than it wishes to provide.  The provider simply returns an  
>>>> error if:
>>>>
>>>> •     The provider cannot evaluate an instance of  
>>>> {QueryClauseType} that
>>>> the instance of {SearchQueryType} contains.
>>>>
>>>> •     The open content of the instance of {SearchQueryType} is too
>>>> complex for the provider to evaluate.
>>>>
>>>> In short, it's entirely up to the provider how fancy to get with
>>>> search().  We figured that market-pressures would incent each  
>>>> implementer to
>>>> support search appropriately in its provider.  For example, SIM  
>>>> supported
>>>> search on every type of object.  OIM 9.x supports DSML search for  
>>>> users.
>>>>
>>>> Gary
>>>>
>>>> On Mar 29, 2011, at 8:39 AM, John, Anil wrote:
>>>>
>>>>
>>>> Gary,
>>>>
>>>>
>>>> Search gives you by default the equivalent of a batch lookup.
>>>>
>>>> So it does, and a constrained-search profile would meet the  
>>>> functionality
>>>> we are looking for (based on your description below).  Our  
>>>> perspective was
>>>> shaped in a lot of ways by the reluctance of product  
>>>> implementations to
>>>> implement anything beyond the basic operations on a SPML  
>>>> provider.  Search
>>>> is an optional capability (and so is batch, but thought that it  
>>>> would be
>>>> easier to make a case for it).
>>>>
>>>> I would be interested to get the perspective of folks who are  
>>>> product
>>>> implementors to see what would be "easier" to implement for them  
>>>> going
>>>> forward. At the end of the road, we are looking for something  
>>>> that will
>>>> exist in real-life within products and not just as shelf-ware.
>>>>
>>>> Regards,
>>>>
>>>> - Anil
>>>>
>>>> ________________________________________
>>>> From: Gary Cole [gary.cole@oracle.com<mailto:gary.cole@oracle.com>]
>>>> Sent: Tuesday, March 29, 2011 9:20 AM
>>>> To: John, Anil
>>>> Cc: Smith, Thomas C.; OASIS PSTC
>>>> Subject: Batch Lookup (was "Re: ReadOnlyProfile")
>>>>
>>>> Anil,
>>>>
>>>> On Mar 29, 2011, at 7:43 AM, John, Anil wrote:
>>>>
>>>> We also need to re-read the specs to see if there is overlap  
>>>> between
>>>> lookup() and search on what we need to accomplish.
>>>>
>>>> Remind me again please what you need to accomplish.  I may be  
>>>> able to
>>>> help.
>>>>
>>>> For instance, I may be able to clarify something about your
>>>> requirements for "SPML Operations on an Attribute Service".  You
>>>> originally thought that you needed "batch pull" capabilities  
>>>> because
>>>> SAML Attribute Query could not answer the following questions:
>>>> * "Give me the unique id's of all users with Attribute X"
>>>> * "For all users (whose unique id's I just got), give me listing of
>>>> attributes for each (in one shot)"
>>>>
>>>> SPML's Search Capability (section 3.6.7.1 of the main spec) gives  
>>>> you
>>>> all of that in one shot.  You can request one search() and in that
>>>> request use the 'returnData' attribute to specify how much  
>>>> information
>>>> you want back for each matching object:  nothing, identifier-only,
>>>> data (which would include all schema-defined attributes) or
>>>> everything, which would add capability-specific data to schema- 
>>>> defined
>>>> data.  Another parameter allows you to specify which capabilities
>>>> interest you.  In your case, you would specify  
>>>> "returnData='data'", so
>>>> that you would get all of the attributes.  Or you could take the
>>>> default, which is 'everything'.  Unless you have capability- 
>>>> specific
>>>> data, 'everything' is equivalent to 'data'.  A client can also  
>>>> specify
>>>> a maximum limit on the number of matching objects to return.
>>>>
>>>> The Provider may send all of the matching objects in a single
>>>> SearchResult, or the provider may break the results into chunks  
>>>> that
>>>> the requestor can iterate.  Logically, it's still part of a single
>>>> search result, although a series of iterate() requests may be
>>>> necessary to return all matching objects.
>>>>
>>>> So, please help me to understand what a batch operation would add  
>>>> to
>>>> this?  Search gives you by default the equivalent of a batch  
>>>> lookup.
>>>>
>>>> Gary
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe from this mail list, you must leave the OASIS TC  
>>>> that
>>>> generates this mail.  Follow this link to all your TCs in OASIS at:
>>>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe from this mail list, you must leave the OASIS TC  
>>>> that
>>>> generates this mail.  Follow this link to all your TCs in OASIS at:
>>>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe from this mail list, you must leave the OASIS TC that
>>> generates this mail.  Follow this link to all your TCs in OASIS at:
>>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe from this mail list, you must leave the OASIS TC that
>> generates this mail.  Follow this link to all your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/ 
>> my_workgroups.php
>>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
References:
- RE: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: "John, Anil" <Anil.John@jhuapl.edu>
- Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: Gary Cole <gary.cole@oracle.com>
- RE: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: "John, Anil" <Anil.John@jhuapl.edu>
- Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: Gary Cole <gary.cole@oracle.com>
- Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: Gary Cole <gary.cole@oracle.com>
- Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: Tom Zeller <tzeller@internet2.edu>