provision message

Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
From: Tom Zeller <tzeller@internet2.edu>
To: Gary Cole <gary.cole@oracle.com>
Date: Tue, 5 Apr 2011 13:29:31 -0500
Unless I misread, I do not think you specified wildcards or "any"
value, e.g. "*" in an ldap search filter.

Where would wildcards fit into your search schemes ?

There have been a couple of conversations in HE regarding a simple and
generic search syntax, so this conversation is quite timely.

On Tue, Apr 5, 2011 at 10:09 AM, Gary Cole <gary.cole@oracle.com> wrote:
> Anil, after reflecting on this briefly, one thing may be misleading about
> the way I ranked schemes.  See inline below.
>
> On Apr 5, 2011, at 9:38 AM, Gary Cole wrote:
>
>> Anil, I think I can net this out based on my experience.  (I've done this
>> many times with many different back-ends.)
>>
>> 1) The biggest simplification you can make is constraining logical
>> operators to 'AND' alone.
>> - 'NOT' and 'OR' can be both complex and inefficient.
>> - While people often think that they want 'NOT', they usually need only a
>> NOT_EQUAL (NE or !=) matching-operator.
>> - Most of the complexity and processing burden lies in evaluating
>> arbitrarily nested clauses.
>> - If you support only 'AND', then there's no need for a client to specify
>> a logical operator,
>>  which simplifies both syntax and semantics.
>>
>> RECOMMENDATIONS:
>> 1A) If you can get by with 'AND' alone, then by all means do so!  This
>> saves you the most.
>> 1B) Next best approach is to support INNER-ANDS and one level of
>> OUTER-ORS.
>>      -- Any nesting of ANDs and ORs can be put into this form.
>>      -- DBMS do this wherever possible in order to optimize evaluation.
>>      -- Technically, client could get the same result by issuing a
>> separate query for each ORed set of clauses.
>> 1C) Add a NE operator ("!=" or "<>") rather than supporting a logical
>> operator 'NOT'.
>>
>> 2) As far as matching-operators:
>> - EQUALS is most necessary and most efficient.
>> - GT, GTE, LT, LTE are often helpful and are usually efficient.
>> - STARTS_WITH is very commonly desired for string-valued attributes (and
>> is efficient).
>> - CONTAINS and ENDS_WITH are sometimes desired, but are usually
>> inefficient.
>>
>> RECOMMENDATIONS:
>> 2A) If you can get by with 'EQUALS' alone, then do so. Otherwise, specify
>> GT, GTE, LT, LTE (and perhaps NE).
>> 2B) Don't require CONTAINS or END_WITH unless you truly need them.
>> 2C) May not need STARTS_WITH; a GTE will do roughly the same thing for
>> string values.
>>
>> 3) With EQUALS, GT, GTE, LT, LTE, you must decide whether comparison is
>> always LEXICAL or is ARITHMETIC for numeric values:
>> - Lexically, "15" < "5"
>> - Arithmetically, 15 > 5.
>> Reviewing the approaches I've seen:
>> 3A) Comparison is arithmetic when comparing a specified value to an
>> attribute with a numeric syntax.
>> 3B) Comparison is always lexical (i.e., we treat everything as a string).
>> 3C) Define separate operators for lexical and arithmetic comparisons and
>> validate
>>      (e.g., throw errors when an arithmetic operator is applied to a
>> string-valued attribute).
>>
>> Collecting these into schemes ranked by levels of simplicity (which is a
>> new thought exercise for me):
>> 0. One attribute per query (no AND) and EQUALS is the only matching
>> operator.
>> 1. AND and EQUALS only. (AND is implicit).
>> 2. AND and EQUALS only. (AND is explicit).
>> 2. AND, EQUALS, GT, GTE, LT, LTE.
>> 3. AND, EQUALS, GT, GTE, LT, LTE, NE.
>> 4. AND, EQUALS, GT, GTE, LT, LTE, NE, STARTS_WITH.
>> 5. AND, EQUALS, GT, GTE, LT, LTE, NE, STARTS_WITH, END_WITH, CONTAINS.
>
>
>> 10. INNER ANDS and one level of OUTER ORs (plus all matching operators)
>
> Adding support for OUTER ORs increases complexity by at least one order of
> magnitude.
>
>> 100. Nestable AND, OR and NOT (plus all matching operators).
>
> Adding support for arbitrarily nested ANDs, ORs and NOTs increases
> complexity by at least two orders of magnitude (and by more if the back-end
> does not natively support such constructs).
>
>>
>> Details below after my signature.
>>
>> Gary
>>
>> <Details>
>>
>> Supporting multiple, arbitrarily nested logical operators is complex and
>> very expensive:
>> - Client must build (and must structure properly) a more complex request
>> - Server must parse (and must evaluate properly) a more complex request
>> - Implementing NOT on some back-ends (or for nested clauses)
>>  is very complex for (and imposes an inordinate processing burden on) the
>> server.
>> - Implementing OR on some back-ends is complex and imposes a significant
>> processing burden on the server.
>> - Clients can send inefficiently-structured queries, which may tempt the
>> server to optimize queries, adding more complexity.
>>
>> Operators for matching deserve some thought:
>> - EQ is always necessary and is simple semantically once you specify
>> case-sensitivity (in this case, case-insensitive).
>>   ** Must specify case-sensitivity **
>> - GT is very helpful, especially for ordering results or retrieving in
>> chunks.
>>   ** Must specify whether comparison is lexical and when (if ever)
>> comparison is arithmetic. **
>> - LT is used less-often than GT, but if you're supporting GT doesn't add
>> much difficulty.
>> - GTE is turns out to be helpful when ordering results or retrieving in
>> chunks.
>>   (Again, if you're supporting GT, GTE doesn't add much effort.)
>> - LTE is the same.  If you're supporting GT/GTE, might as well support
>> LTE.
>> - 'STARTS_WITH' is very commonly used with strings.  Implementations are
>> generally efficient.
>> - 'CONTAINS' is next-most-frequently-requested, but implementation is
>> usually inefficient.
>> - 'ENDS_WITH' is sometimes requested, but implementation is almost as bad
>> as CONTAINS.
>>
>> </Details>
>>
>>
>> On Apr 5, 2011, at 6:32 AM, John, Anil wrote:
>>
>>> A correction to my orignal e-mail below:
>>>
>>> "...authoritative sources of data that have existing processes in place
>>> for Attribute Management and as such Updates/Deletes etc are *NOT*
>>> permitted" [via the SPML interface]
>>>
>>> Yes, this would be the minima needed to support conformance to the
>>> profile. If you support more, that would be a good differentiator for the
>>> implementation.
>>>
>>> As to support for logical operators beyond 'AND' and the set of
>>> matching-operators, this is where I need a bit of help.. Ideally I would
>>> like to have support for AND/OR/NOT combined with  (=)/(>)/(<)/(>=)/(<=)
>>> applied to a specific subset of case-insensitive attributes, but I don't
>>> have a sense of how expensive the operations are to implement. Would
>>> appreciate some feedback on that point.
>>>
>>> Regards,
>>>
>>> - Anil
>>>
>>> ________________________________________
>>> From: Gary Cole [gary.cole@oracle.com]
>>> Sent: Monday, April 04, 2011 6:03 PM
>>> To: John, Anil
>>> Cc: Smith, Thomas C.; OASIS PSTC
>>> Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
>>>
>>> Thanks!  I think I begin to understand.  Let me play it back to you to be
>>> sure I have it right.  The goal is to simplify search() and constrain it so
>>> that, rather than being open-ended, search becomes simpler to implement and
>>> to test.  The first dimension to constrain is that of supported logical
>>> operators: AND would be the only operator.  Another dimension to constrain
>>> would be the set of queryable attributes: here a provider would need a way
>>> to advertise the subset of attributes on which it supports search.
>>>
>>> Do I have that right?  If so, let me ask a few more questions.
>>>
>>> You didn't mention constraining the set of matching-operators; would you
>>> want to limit/require certain of these?  I assume that you'd want equals
>>> (=).  Would you also want greater-than (>), less-than (<),
>>> greater-than-or-equal-to (>=), less-than-or-equal-to (<=)?  What about
>>> startsWith, endsWith, contains?  Could you get by with just equals and
>>> startsWith?  (The operators endsWith and contains tend to be rather
>>> expensive and inefficient.)  For matching-operators on alphabetic values,
>>> there's the further issue of case-sensitivity.  Would you be prefer to
>>> specify case-insensitive matching or case-sensitive matching?
>>>
>>> Would this be specifying (in effect) minima?  That is, you would not
>>> object if a provider supported more in the way of search than you require,
>>> as long as the specified behavior is supported in a standard way, right?
>>>
>>> On Apr 4, 2011, at 9:01 AM, John, Anil wrote:
>>>
>>> My perspective on this is being driven by the need to implement an
>>> batch/occasionally-connected interface to an Attribute Provider (AP) that
>>> uses SPML as the interface specification. The motivator for the “Read Only”
>>> portion is that AP is fronting authoritative sources of data that have
>>> existing processes in place for Attribute Management and as such
>>> Updates/Deletes etc are permitted.
>>>
>>> The assumption in this case is that  the “Attribute Contract” that is
>>> supported by the AP is known and fixed.. i.e. There is a finite set of
>>> attributes that are exposed via this interface and are advertised via the
>>> listTargets operations
>>>
>>> At the same time, one of the items that came out of the Burton Group
>>> discussions around SPML was that implementing a provider that that supported
>>> all permutations of the ‘and’ or ‘or’ and ‘not’ operators combined with all
>>> attributes was non-trivial which seems to have ended with little to no
>>> support and *no way to verify what support existed* in individual products.
>>>
>>> So, what I’d like to see in this read only profile is a way to provide a
>>> mechanism that limits Search using a combination of specific operators and
>>> attributes. i.e. I will allow queries that allow only specific operators
>>> combined with specific clauses.  E.g. Allow only ‘and’ operations on
>>> attributes X, Y and Z.
>>>
>>> The two use cases that are expected to be enabled by this are:
>>>
>>> 1)      Ability to query the AP to retrieve attributes of multiple users
>>> all in one shot, potentially for provisioning use cases
>>> 2)      Ability to do a one way synch from the AP to a local system..
>>> i.e. The AP will always be the master system that will overwrite the local
>>> store
>>>
>>> The key here is to make sure that the profile itself is constrained
>>> enough that it is implementable and testable.
>>>
>>> I am not sure if I answered your question to the level of detail you are
>>> looking for but I am hoping that there is general interest in such a
>>> capability.
>>>
>>> Regards,
>>>
>>> -        Anil
>>>
>>>
>>>
>>> From: Gary Cole [mailto:gary.cole@oracle.com]
>>> Sent: Tuesday, March 29, 2011 2:44 PM
>>> To: John, Anil
>>> Cc: Smith, Thomas C.; OASIS PSTC
>>> Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
>>>
>>> In what ways would you constrain search?  Would these constraints be
>>> minima, maxima or both?
>>>
>>> A provider constrains the set of object-classes for which the provider
>>> supports search in the listTargetsResponse.
>>>
>>> Search as specified in SPMLv2 is at heart a subset of LDAP search:
>>> - scope: 'pso', 'oneLevel' or 'subTree'
>>> - operators: 'and', 'or', and 'not'
>>> - clauses: dependent on profile or provider, but examples show
>>> <attribute-name> = <attribute-value>.
>>>
>>> The  DSML profile looks especially LDAP-like, since DSML is basically
>>> LDAP wrapped in a bunch of XML tags.  Nothing in the specification of the
>>> base protocol nor the DSML profile requires a provider to support fancier
>>> search than it wishes to provide.  The provider simply returns an error if:
>>>
>>> •     The provider cannot evaluate an instance of {QueryClauseType} that
>>> the instance of {SearchQueryType} contains.
>>>
>>> •     The open content of the instance of {SearchQueryType} is too
>>> complex for the provider to evaluate.
>>>
>>> In short, it's entirely up to the provider how fancy to get with
>>> search().  We figured that market-pressures would incent each implementer to
>>> support search appropriately in its provider.  For example, SIM supported
>>> search on every type of object.  OIM 9.x supports DSML search for users.
>>>
>>> Gary
>>>
>>> On Mar 29, 2011, at 8:39 AM, John, Anil wrote:
>>>
>>>
>>> Gary,
>>>
>>>
>>> Search gives you by default the equivalent of a batch lookup.
>>>
>>> So it does, and a constrained-search profile would meet the functionality
>>> we are looking for (based on your description below).  Our perspective was
>>> shaped in a lot of ways by the reluctance of product implementations to
>>> implement anything beyond the basic operations on a SPML provider.  Search
>>> is an optional capability (and so is batch, but thought that it would be
>>> easier to make a case for it).
>>>
>>> I would be interested to get the perspective of folks who are product
>>> implementors to see what would be "easier" to implement for them going
>>> forward. At the end of the road, we are looking for something that will
>>> exist in real-life within products and not just as shelf-ware.
>>>
>>> Regards,
>>>
>>> - Anil
>>>
>>> ________________________________________
>>> From: Gary Cole [gary.cole@oracle.com<mailto:gary.cole@oracle.com>]
>>> Sent: Tuesday, March 29, 2011 9:20 AM
>>> To: John, Anil
>>> Cc: Smith, Thomas C.; OASIS PSTC
>>> Subject: Batch Lookup (was "Re: ReadOnlyProfile")
>>>
>>> Anil,
>>>
>>> On Mar 29, 2011, at 7:43 AM, John, Anil wrote:
>>>
>>> We also need to re-read the specs to see if there is overlap between
>>> lookup() and search on what we need to accomplish.
>>>
>>> Remind me again please what you need to accomplish.  I may be able to
>>> help.
>>>
>>> For instance, I may be able to clarify something about your
>>> requirements for "SPML Operations on an Attribute Service".  You
>>> originally thought that you needed "batch pull" capabilities because
>>> SAML Attribute Query could not answer the following questions:
>>> * "Give me the unique id's of all users with Attribute X"
>>> * "For all users (whose unique id's I just got), give me listing of
>>> attributes for each (in one shot)"
>>>
>>> SPML's Search Capability (section 3.6.7.1 of the main spec) gives you
>>> all of that in one shot.  You can request one search() and in that
>>> request use the 'returnData' attribute to specify how much information
>>> you want back for each matching object:  nothing, identifier-only,
>>> data (which would include all schema-defined attributes) or
>>> everything, which would add capability-specific data to schema-defined
>>> data.  Another parameter allows you to specify which capabilities
>>> interest you.  In your case, you would specify "returnData='data'", so
>>> that you would get all of the attributes.  Or you could take the
>>> default, which is 'everything'.  Unless you have capability-specific
>>> data, 'everything' is equivalent to 'data'.  A client can also specify
>>> a maximum limit on the number of matching objects to return.
>>>
>>> The Provider may send all of the matching objects in a single
>>> SearchResult, or the provider may break the results into chunks that
>>> the requestor can iterate.  Logically, it's still part of a single
>>> search result, although a series of iterate() requests may be
>>> necessary to return all matching objects.
>>>
>>> So, please help me to understand what a batch operation would add to
>>> this?  Search gives you by default the equivalent of a batch lookup.
>>>
>>> Gary
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe from this mail list, you must leave the OASIS TC that
>>> generates this mail.  Follow this link to all your TCs in OASIS at:
>>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe from this mail list, you must leave the OASIS TC that
>>> generates this mail.  Follow this link to all your TCs in OASIS at:
>>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe from this mail list, you must leave the OASIS TC that
>> generates this mail.  Follow this link to all your TCs in OASIS at:
>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
Follow-Ups:
- Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: Gary Cole <gary.cole@oracle.com>
References:
- RE: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: "John, Anil" <Anil.John@jhuapl.edu>
- Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: Gary Cole <gary.cole@oracle.com>
- RE: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: "John, Anil" <Anil.John@jhuapl.edu>
- Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: Gary Cole <gary.cole@oracle.com>
- Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
  - From: Gary Cole <gary.cole@oracle.com>