OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xacml message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [xacml] For Thursday: ABAC and big data

Hi Hal,

On 15/04/2016 2:48 AM, Hal Lockhart wrote:

-----Original Message-----
From: Steven Legg [mailto:steven.legg@viewds.com]
Sent: Wednesday, April 06, 2016 1:11 AM
To: Hal Lockhart; xacml@lists.oasis-open.org
Subject: Re: [xacml] For Thursday: ABAC and big data

Hi Hal,

On 30/03/2016 6:25 AM, Hal Lockhart wrote:
It seems to me that when considering ABAC and big data there are two potential scenarios. The first is that access to a large non-SQL database should be protected by policy just as done today with existing databases. The second is the possibility of using the big data itself as input to an access control decision.

Concerning the first, I believe Hadoop, for example has an access control callout which could easily be mated with an XACML PEP. In fact I hope this project will actually be done at Apache once OpenAz gets better organized. It is one of the reasons we moved the project there. The PEP would use subject information combined with information from the Hapooq query as the source of attributes.

Steven wrote:
The architectures of large no-SQL databases may present some performance issues for ABAC.

Firstly, regardless of the database, fine-grained access control typically requires an order of magnitude more authorization requests than the number of entities considered by a query. If those authorization requests are made to an external PDP then the messaging traffic increases dramatically. Of course, the answer to that is to embed the PDP in the database application so the authorization requests are all internal. However, "internal" may not be that internal where big data is concerned.

The large no-SQL databases are able to scale indefinitely in size because the data are spread over an increasing number of database nodes, none of which contain a complete copy of the database. As the database size increases the chance that a key lookup can be satisfied by a local database node diminishes, so the context handler of our embedded PDP will often be doing a remote lookup for the attributes of the access subject and other entities. This is assuming that access subject and other entities are also stored in the no-SQL database. The remote lookups could be reduced if every embedded PDP had access to a local copy of at least the complete user data, which may not be a palatable architectural solution for a variety of reasons. Going back to user entities stored in the no-SQL database, another assumption is that the subject-id is the key for looking them up. If instead a search is required to find an entity it will be a more expensive distributed search. To further exa

the situation, the log structured merge trees that give some no-SQL databases their phenomenal write performance sacrifice query performance to achieve it.

With big data databases collecting copious amounts of information about users (say as customers) it wouldn't be strange for those users to have a say in how that information is used through privacy preferences, but applying privacy preferences in a big data database would be particularly challenging. Two possible solutions are to generate a separate XACML policy for each user's preferences, or to store the preferences as an XML document or nested entity in the user's entity and have a single XACML policy that evaluates the preferences for any given user.
The former means there is a very large number of XACML policies to work through on each authorization decision, perhaps too many for each embedded PDP to have a copy, and most of them will not be applicable. The latter means lots of remote lookups or distributed searches for user entities if they are in the no-SQL database, since a typical query will touch records pertaining to many different users.

Big data databases aren't like the databases we otherwise deal with and that means we have to approach ABAC for them somewhat differently.


Several points.

1. I don't understand the reasoning behind this statement: "Firstly, regardless of the database, fine-grained access control typically requires an order of magnitude more authorization requests than the number of entities considered by a query." Perhaps you are making different assumptions than I am.

I use LDAP/X.500 and X.500 Basic Access Control (BAC) as a poster child for
fine-grained access control since I'm familiar with it by implementation.
For an entry to be returned by an LDAP or X.500 search the user must have Browse
permission on the entry. To return an attribute of the entry the user must
additionally have Read permission on the attribute type. To return a value of the
attribute the user must additionally have Read permission on that value (I did say
fine-grained, after all). If the attribute value matched an item in the search
filter then FilterMatch permission is also required for the attribute type and
attribute value. So for a search filtering two attributes that matches one entry
with ten single-valued attributes we are looking at 25 permissions to check, or in
other words, 25 authorization decisions. That's over one order of magnitude and
pushing up towards two. Any access control scheme at the granularity of attributes
and/or attribute values, which is what I consider to be fine-grained, is going to
be looking at many more authorization decisions than the number of
entities/entries/records considered.

2. I did not state that I was assuming that Subject information would not be obtained from the big data source, but from the usual sources, via LDAP, SQL, SAML or OIDC, etc. I meant that Resource and perhaps Action attributes would be obtained from the query itself.

3. Even when data are spread over multiple DB nodes, there has to be an initial query handler to farm out the queries and assemble the results. I assumed that the PEP could be located at this entity. Perhaps this is not realistic.

It depends who you are. A database developer has a choice. Depending on the
XACML policies there may be an advantage to authorization checks at each database
node instead.

An application developer relies on the database developer providing the right
hooks. If it's possible then putting the PEP in the initial query handler would
seem to be better if the attributes needed for authorization come from the
database in question. The PEP may need to be put in the application if attributes
come from other sources and the initial query handler isn't well placed to call
out to those sources.

4. With regards to message traffic, my assumption is that if you care about performance at all, i.e. we are not talking about a demo or PoC, that the PEP is in the same process as the PDP.

But where is the PIP ? In the case of my LDAP/X.500 directory server making
authorization decisions it is its own PIP. I don't have to worry whether the PEP
in the search processor has collected the right attributes for making authorization
decisions because the PDP's context handler is using the same entry cache and can
get the attributes it needs for itself just as efficiently.

Now put the context handler in the "initial query handler" process with the PEP.
If it needs to fetch more attributes they are most likely to be on different
hosts. To minimize the cost we would want the PEP to gather all the necessary
attributes before presenting an authorization request to the context handler and
PDP. The access-subject probably is, or can be, cached locally, so it's okay. The
resources are returned from database nodes so the query handler has to make sure
it asks for all the attributes it will need for authorization decisions, in the
worst case by asking for everything. There probably isn't a low-cost way of
prefetching related entities or flattened attributes. So it becomes a case of
finding a balance between wasting time and effort collecting stuff up front that
turns out not to be needed, and expending time and effort having to go fetch
missing stuff later (at more expense than if you chose to ask for it before hand).

I envision a world where every process contains an embedded PDP which loads its policies at startup and updates them on admin command.

5. Big data is usually about computing values over large datasets. I assume if the data is privacy sensitive it would be anonymized prior to being loaded into the DB. (Yes, I am aware of the issues in doing this.) I cannot imagine a big data app where you would be checking privacy preferences over the many thousands or millions of records you are using to determine, for example the speed of traffic on some highway, whether or not you were using XACML or some other sort of access control.

Big data is about many things. I know of one service provider that stores all
the user records for all their organizational customers in a big data DB. The
DB is their identity store; the exact opposite of anonymized.

And of course the "speed of traffic on some highway" was extracted from the GPS
logs of (consenting?) smart phone users :-). I suspect there is a lot of raw data
full of PII that only gets sanitized on export. It seems popular to collect data
in detail now and think about what useful information might be gleaned from it


Concerning using the big data itself for access control decisions, I can't think of an obvious usecase. XACML normally deals with attributes like group or department which have a single value or a small number of values. I can imagine something like a sensor network (IoT) where you would want to sample the environment and periodically adjust some metric which in turn is used as a policy input. For example, if the number of transactions per second or the number of attacks or the amount of snowfall reaches some threshold, you might want to adjust the access control rules. This would not be done by modifying policy, but including in the policy some reference to the attribute which reflects the changing state.

Steven wrote:
That attribute is something you would want to periodically compute and store rather than calculate on demand during authorization requests so as to avoid expensive distributed transactions in the big data database.


Yes that was my assumption. I never meant to suggest that it would be computed at AC decision time. Rather I envisioned it being some kind of global environment attribute, like DEFCON in a national security context.



To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]