OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

provision message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")


Anil, I think I can net this out based on my experience.  (I've done  
this many times with many different back-ends.)

1) The biggest simplification you can make is constraining logical  
operators to 'AND' alone.
- 'NOT' and 'OR' can be both complex and inefficient.
- While people often think that they want 'NOT', they usually need  
only a NOT_EQUAL (NE or !=) matching-operator.
- Most of the complexity and processing burden lies in evaluating  
arbitrarily nested clauses.
- If you support only 'AND', then there's no need for a client to  
specify a logical operator,
    which simplifies both syntax and semantics.

RECOMMENDATIONS:
1A) If you can get by with 'AND' alone, then by all means do so!  This  
saves you the most.
1B) Next best approach is to support INNER-ANDS and one level of OUTER- 
ORS.
        -- Any nesting of ANDs and ORs can be put into this form.
        -- DBMS do this wherever possible in order to optimize  
evaluation.
        -- Technically, client could get the same result by issuing a  
separate query for each ORed set of clauses.
1C) Add a NE operator ("!=" or "<>") rather than supporting a logical  
operator 'NOT'.

2) As far as matching-operators:
- EQUALS is most necessary and most efficient.
- GT, GTE, LT, LTE are often helpful and are usually efficient.
- STARTS_WITH is very commonly desired for string-valued attributes  
(and is efficient).
- CONTAINS and ENDS_WITH are sometimes desired, but are usually  
inefficient.

RECOMMENDATIONS:
2A) If you can get by with 'EQUALS' alone, then do so. Otherwise,  
specify GT, GTE, LT, LTE (and perhaps NE).
2B) Don't require CONTAINS or END_WITH unless you truly need them.
2C) May not need STARTS_WITH; a GTE will do roughly the same thing for  
string values.

3) With EQUALS, GT, GTE, LT, LTE, you must decide whether comparison  
is always LEXICAL or is ARITHMETIC for numeric values:
- Lexically, "15" < "5"
- Arithmetically, 15 > 5.
Reviewing the approaches I've seen:
3A) Comparison is arithmetic when comparing a specified value to an  
attribute with a numeric syntax.
3B) Comparison is always lexical (i.e., we treat everything as a  
string).
3C) Define separate operators for lexical and arithmetic comparisons  
and validate
        (e.g., throw errors when an arithmetic operator is applied to  
a string-valued attribute).

Collecting these into schemes ranked by levels of simplicity (which is  
a new thought exercise for me):
0. One attribute per query (no AND) and EQUALS is the only matching  
operator.
1. AND and EQUALS only. (AND is implicit).
2. AND and EQUALS only. (AND is explicit).
2. AND, EQUALS, GT, GTE, LT, LTE.
3. AND, EQUALS, GT, GTE, LT, LTE, NE.
4. AND, EQUALS, GT, GTE, LT, LTE, NE, STARTS_WITH.
5. AND, EQUALS, GT, GTE, LT, LTE, NE, STARTS_WITH, END_WITH, CONTAINS.
10. INNER ANDS and one level of OUTER ORs (plus all matching operators)
100. Nestable AND, OR and NOT (plus all matching operators).

Details below after my signature.

Gary

<Details>

Supporting multiple, arbitrarily nested logical operators is complex  
and very expensive:
- Client must build (and must structure properly) a more complex request
- Server must parse (and must evaluate properly) a more complex request
- Implementing NOT on some back-ends (or for nested clauses)
    is very complex for (and imposes an inordinate processing burden  
on) the server.
- Implementing OR on some back-ends is complex and imposes a  
significant processing burden on the server.
- Clients can send inefficiently-structured queries, which may tempt  
the server to optimize queries, adding more complexity.

Operators for matching deserve some thought:
- EQ is always necessary and is simple semantically once you specify  
case-sensitivity (in this case, case-insensitive).
     ** Must specify case-sensitivity **
  - GT is very helpful, especially for ordering results or retrieving  
in chunks.
     ** Must specify whether comparison is lexical and when (if ever)  
comparison is arithmetic. **
- LT is used less-often than GT, but if you're supporting GT doesn't  
add much difficulty.
  - GTE is turns out to be helpful when ordering results or retrieving  
in chunks.
     (Again, if you're supporting GT, GTE doesn't add much effort.)
- LTE is the same.  If you're supporting GT/GTE, might as well support  
LTE.
- 'STARTS_WITH' is very commonly used with strings.  Implementations  
are generally efficient.
- 'CONTAINS' is next-most-frequently-requested, but implementation is  
usually inefficient.
- 'ENDS_WITH' is sometimes requested, but implementation is almost as  
bad as CONTAINS.

</Details>


On Apr 5, 2011, at 6:32 AM, John, Anil wrote:

> A correction to my orignal e-mail below:
>
> "...authoritative sources of data that have existing processes in  
> place for Attribute Management and as such Updates/Deletes etc are  
> *NOT* permitted" [via the SPML interface]
>
> Yes, this would be the minima needed to support conformance to the  
> profile. If you support more, that would be a good differentiator  
> for the implementation.
>
> As to support for logical operators beyond 'AND' and the set of  
> matching-operators, this is where I need a bit of help.. Ideally I  
> would like to have support for AND/OR/NOT combined with  (=)/(>)/(<)/ 
> (>=)/(<=) applied to a specific subset of case-insensitive  
> attributes, but I don't have a sense of how expensive the operations  
> are to implement. Would appreciate some feedback on that point.
>
> Regards,
>
> - Anil
>
> ________________________________________
> From: Gary Cole [gary.cole@oracle.com]
> Sent: Monday, April 04, 2011 6:03 PM
> To: John, Anil
> Cc: Smith, Thomas C.; OASIS PSTC
> Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
>
> Thanks!  I think I begin to understand.  Let me play it back to you  
> to be sure I have it right.  The goal is to simplify search() and  
> constrain it so that, rather than being open-ended, search becomes  
> simpler to implement and to test.  The first dimension to constrain  
> is that of supported logical operators: AND would be the only  
> operator.  Another dimension to constrain would be the set of  
> queryable attributes: here a provider would need a way to advertise  
> the subset of attributes on which it supports search.
>
> Do I have that right?  If so, let me ask a few more questions.
>
> You didn't mention constraining the set of matching-operators; would  
> you want to limit/require certain of these?  I assume that you'd  
> want equals (=).  Would you also want greater-than (>), less-than  
> (<), greater-than-or-equal-to (>=), less-than-or-equal-to (<=)?   
> What about startsWith, endsWith, contains?  Could you get by with  
> just equals and startsWith?  (The operators endsWith and contains  
> tend to be rather expensive and inefficient.)  For matching- 
> operators on alphabetic values, there's the further issue of case- 
> sensitivity.  Would you be prefer to specify case-insensitive  
> matching or case-sensitive matching?
>
> Would this be specifying (in effect) minima?  That is, you would not  
> object if a provider supported more in the way of search than you  
> require, as long as the specified behavior is supported in a  
> standard way, right?
>
> On Apr 4, 2011, at 9:01 AM, John, Anil wrote:
>
> My perspective on this is being driven by the need to implement an  
> batch/occasionally-connected interface to an Attribute Provider (AP)  
> that uses SPML as the interface specification. The motivator for the  
> “Read Only” portion is that AP is fronting authoritative sources of  
> data that have existing processes in place for Attribute Management  
> and as such Updates/Deletes etc are permitted.
>
> The assumption in this case is that  the “Attribute Contract” that  
> is supported by the AP is known and fixed.. i.e. There is a finite  
> set of attributes that are exposed via this interface and are  
> advertised via the listTargets operations
>
> At the same time, one of the items that came out of the Burton Group  
> discussions around SPML was that implementing a provider that that  
> supported all permutations of the ‘and’ or ‘or’ and ‘not’ operators  
> combined with all attributes was non-trivial which seems to have  
> ended with little to no support and *no way to verify what support  
> existed* in individual products.
>
> So, what I’d like to see in this read only profile is a way to  
> provide a mechanism that limits Search using a combination of  
> specific operators and attributes. i.e. I will allow queries that  
> allow only specific operators combined with specific clauses.  E.g.  
> Allow only ‘and’ operations on attributes X, Y and Z.
>
> The two use cases that are expected to be enabled by this are:
>
> 1)      Ability to query the AP to retrieve attributes of multiple  
> users all in one shot, potentially for provisioning use cases
> 2)      Ability to do a one way synch from the AP to a local  
> system.. i.e. The AP will always be the master system that will  
> overwrite the local store
>
> The key here is to make sure that the profile itself is constrained  
> enough that it is implementable and testable.
>
> I am not sure if I answered your question to the level of detail you  
> are looking for but I am hoping that there is general interest in  
> such a capability.
>
> Regards,
>
> -        Anil
>
>
>
> From: Gary Cole [mailto:gary.cole@oracle.com]
> Sent: Tuesday, March 29, 2011 2:44 PM
> To: John, Anil
> Cc: Smith, Thomas C.; OASIS PSTC
> Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
>
> In what ways would you constrain search?  Would these constraints be  
> minima, maxima or both?
>
> A provider constrains the set of object-classes for which the  
> provider supports search in the listTargetsResponse.
>
> Search as specified in SPMLv2 is at heart a subset of LDAP search:
> - scope: 'pso', 'oneLevel' or 'subTree'
> - operators: 'and', 'or', and 'not'
> - clauses: dependent on profile or provider, but examples show  
> <attribute-name> = <attribute-value>.
>
> The  DSML profile looks especially LDAP-like, since DSML is  
> basically LDAP wrapped in a bunch of XML tags.  Nothing in the  
> specification of the base protocol nor the DSML profile requires a  
> provider to support fancier search than it wishes to provide.  The  
> provider simply returns an error if:
>
> •     The provider cannot evaluate an instance of {QueryClauseType}  
> that the instance of {SearchQueryType} contains.
>
> •     The open content of the instance of {SearchQueryType} is too  
> complex for the provider to evaluate.
>
> In short, it's entirely up to the provider how fancy to get with  
> search().  We figured that market-pressures would incent each  
> implementer to support search appropriately in its provider.  For  
> example, SIM supported search on every type of object.  OIM 9.x  
> supports DSML search for users.
>
> Gary
>
> On Mar 29, 2011, at 8:39 AM, John, Anil wrote:
>
>
> Gary,
>
>
> Search gives you by default the equivalent of a batch lookup.
>
> So it does, and a constrained-search profile would meet the  
> functionality we are looking for (based on your description below).   
> Our perspective was shaped in a lot of ways by the reluctance of  
> product implementations to implement anything beyond the basic  
> operations on a SPML provider.  Search is an optional capability  
> (and so is batch, but thought that it would be easier to make a case  
> for it).
>
> I would be interested to get the perspective of folks who are  
> product implementors to see what would be "easier" to implement for  
> them going forward. At the end of the road, we are looking for  
> something that will exist in real-life within products and not just  
> as shelf-ware.
>
> Regards,
>
> - Anil
>
> ________________________________________
> From: Gary Cole [gary.cole@oracle.com<mailto:gary.cole@oracle.com>]
> Sent: Tuesday, March 29, 2011 9:20 AM
> To: John, Anil
> Cc: Smith, Thomas C.; OASIS PSTC
> Subject: Batch Lookup (was "Re: ReadOnlyProfile")
>
> Anil,
>
> On Mar 29, 2011, at 7:43 AM, John, Anil wrote:
>
> We also need to re-read the specs to see if there is overlap between
> lookup() and search on what we need to accomplish.
>
> Remind me again please what you need to accomplish.  I may be able to
> help.
>
> For instance, I may be able to clarify something about your
> requirements for "SPML Operations on an Attribute Service".  You
> originally thought that you needed "batch pull" capabilities because
> SAML Attribute Query could not answer the following questions:
>  * "Give me the unique id's of all users with Attribute X"
>  * "For all users (whose unique id's I just got), give me listing of
> attributes for each (in one shot)"
>
> SPML's Search Capability (section 3.6.7.1 of the main spec) gives you
> all of that in one shot.  You can request one search() and in that
> request use the 'returnData' attribute to specify how much information
> you want back for each matching object:  nothing, identifier-only,
> data (which would include all schema-defined attributes) or
> everything, which would add capability-specific data to schema-defined
> data.  Another parameter allows you to specify which capabilities
> interest you.  In your case, you would specify "returnData='data'", so
> that you would get all of the attributes.  Or you could take the
> default, which is 'everything'.  Unless you have capability-specific
> data, 'everything' is equivalent to 'data'.  A client can also specify
> a maximum limit on the number of matching objects to return.
>
> The Provider may send all of the matching objects in a single
> SearchResult, or the provider may break the results into chunks that
> the requestor can iterate.  Logically, it's still part of a single
> search result, although a series of iterate() requests may be
> necessary to return all matching objects.
>
> So, please help me to understand what a batch operation would add to
> this?  Search gives you by default the equivalent of a batch lookup.
>
> Gary
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]