OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xacml message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [xacml] For Thursday: ABAC and big data

-----Original Message-----
From: Steven Legg [mailto:steven.legg@viewds.com] 
Sent: Wednesday, April 06, 2016 1:11 AM
To: Hal Lockhart; xacml@lists.oasis-open.org
Subject: Re: [xacml] For Thursday: ABAC and big data

Hi Hal,

On 30/03/2016 6:25 AM, Hal Lockhart wrote:
> It seems to me that when considering ABAC and big data there are two potential scenarios. The first is that access to a large non-SQL database should be protected by policy just as done today with existing databases. The second is the possibility of using the big data itself as input to an access control decision.
> Concerning the first, I believe Hadoop, for example has an access control callout which could easily be mated with an XACML PEP. In fact I hope this project will actually be done at Apache once OpenAz gets better organized. It is one of the reasons we moved the project there. The PEP would use subject information combined with information from the Hapooq query as the source of attributes.

Steven wrote:
The architectures of large no-SQL databases may present some performance issues for ABAC.

Firstly, regardless of the database, fine-grained access control typically requires an order of magnitude more authorization requests than the number of entities considered by a query. If those authorization requests are made to an external PDP then the messaging traffic increases dramatically. Of course, the answer to that is to embed the PDP in the database application so the authorization requests are all internal. However, "internal" may not be that internal where big data is concerned.

The large no-SQL databases are able to scale indefinitely in size because the data are spread over an increasing number of database nodes, none of which contain a complete copy of the database. As the database size increases the chance that a key lookup can be satisfied by a local database node diminishes, so the context handler of our embedded PDP will often be doing a remote lookup for the attributes of the access subject and other entities. This is assuming that access subject and other entities are also stored in the no-SQL database. The remote lookups could be reduced if every embedded PDP had access to a local copy of at least the complete user data, which may not be a palatable architectural solution for a variety of reasons. Going back to user entities stored in the no-SQL database, another assumption is that the subject-id is the key for looking them up. If instead a search is required to find an entity it will be a more expensive distributed search. To further exacerbate the situation, the log structured merge trees that give some no-SQL databases their phenomenal write performance sacrifice query performance to achieve it.

With big data databases collecting copious amounts of information about users (say as customers) it wouldn't be strange for those users to have a say in how that information is used through privacy preferences, but applying privacy preferences in a big data database would be particularly challenging. Two possible solutions are to generate a separate XACML policy for each user's preferences, or to store the preferences as an XML document or nested entity in the user's entity and have a single XACML policy that evaluates the preferences for any given user.
The former means there is a very large number of XACML policies to work through on each authorization decision, perhaps too many for each embedded PDP to have a copy, and most of them will not be applicable. The latter means lots of remote lookups or distributed searches for user entities if they are in the no-SQL database, since a typical query will touch records pertaining to many different users.

Big data databases aren't like the databases we otherwise deal with and that means we have to approach ABAC for them somewhat differently.


Several points.

1. I don't understand the reasoning behind this statement: "Firstly, regardless of the database, fine-grained access control typically requires an order of magnitude more authorization requests than the number of entities considered by a query." Perhaps you are making different assumptions than I am.

2. I did not state that I was assuming that Subject information would not be obtained from the big data source, but from the usual sources, via LDAP, SQL, SAML or OIDC, etc. I meant that Resource and perhaps Action attributes would be obtained from the query itself.

3. Even when data are spread over multiple DB nodes, there has to be an initial query handler to farm out the queries and assemble the results. I assumed that the PEP could be located at this entity. Perhaps this is not realistic.

4. With regards to message traffic, my assumption is that if you care about performance at all, i.e. we are not talking about a demo or PoC, that the PEP is in the same process as the PDP. I envision a world where every process contains an embedded PDP which loads its policies at startup and updates them on admin command.

5. Big data is usually about computing values over large datasets. I assume if the data is privacy sensitive it would be anonymized prior to being loaded into the DB. (Yes, I am aware of the issues in doing this.) I cannot imagine a big data app where you would be checking privacy preferences over the many thousands or millions of records you are using to determine, for example the speed of traffic on some highway, whether or not you were using XACML or some other sort of access control.

> Concerning using the big data itself for access control decisions, I can't think of an obvious usecase. XACML normally deals with attributes like group or department which have a single value or a small number of values. I can imagine something like a sensor network (IoT) where you would want to sample the environment and periodically adjust some metric which in turn is used as a policy input. For example, if the number of transactions per second or the number of attacks or the amount of snowfall reaches some threshold, you might want to adjust the access control rules. This would not be done by modifying policy, but including in the policy some reference to the attribute which reflects the changing state.

Steven wrote:
That attribute is something you would want to periodically compute and store rather than calculate on demand during authorization requests so as to avoid expensive distributed transactions in the big data database.


Yes that was my assumption. I never meant to suggest that it would be computed at AC decision time. Rather I envisioned it being some kind of global environment attribute, like DEFCON in a national security context.


> Hal

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]