xacml message

Subject: Re: [xacml] For Thursday: ABAC and big data

From: Steven Legg <steven.legg@viewds.com>
To: Hal Lockhart <hal.lockhart@oracle.com>, xacml@lists.oasis-open.org
Date: Wed, 6 Apr 2016 15:10:40 +1000


Hi Hal,

On 30/03/2016 6:25 AM, Hal Lockhart wrote:

It seems to me that when considering ABAC and big data there are two potential scenarios. The first is that access to a large non-SQL database should be protected by policy just as done today with existing databases. The second is the possibility of using the big data itself as input to an access control decision.

Concerning the first, I believe Hadoop, for example has an access control callout which could easily be mated with an XACML PEP. In fact I hope this project will actually be done at Apache once OpenAz gets better organized. It is one of the reasons we moved the project there. The PEP would use subject information combined with information from the Hapooq query as the source of attributes.


The architectures of large no-SQL databases may present some performance issues for
ABAC.

Firstly, regardless of the database, fine-grained access control typically requires
an order of magnitude more authorization requests than the number of entities
considered by a query. If those authorization requests are made to an external PDP
then the messaging traffic increases dramatically. Of course, the answer to that is
to embed the PDP in the database application so the authorization requests are all
internal. However, "internal" may not be that internal where big data is concerned.

The large no-SQL databases are able to scale indefinitely in size because the data
are spread over an increasing number of database nodes, none of which contain a
complete copy of the database. As the database size increases the chance that a key
lookup can be satisfied by a local database node diminishes, so the context
handler of our embedded PDP will often be doing a remote lookup for the attributes
of the access subject and other entities. This is assuming that access subject
and other entities are also stored in the no-SQL database. The remote lookups
could be reduced if every embedded PDP had access to a local copy of at least the
complete user data, which may not be a palatable architectural solution for a
variety of reasons. Going back to user entities stored in the no-SQL database,
another assumption is that the subject-id is the key for looking them up. If
instead a search is required to find an entity it will be a more expensive
distributed search. To further exacerbate the situation, the log structured merge
trees that give some no-SQL databases their phenomenal write performance
sacrifice query performance to achieve it.

With big data databases collecting copious amounts of information about users
(say as customers) it wouldn't be strange for those users to have a say in how
that information is used through privacy preferences, but applying privacy
preferences in a big data database would be particularly challenging. Two possible
solutions are to generate a separate XACML policy for each user's preferences, or
to store the preferences as an XML document or nested entity in the user's entity
and have a single XACML policy that evaluates the preferences for any given user.
The former means there is a very large number of XACML policies to work through
on each authorization decision, perhaps too many for each embedded PDP to have
a copy, and most of them will not be applicable. The latter means lots of remote
lookups or distributed searches for user entities if they are in the no-SQL
database, since a typical query will touch records pertaining to many different
users.

Big data databases aren't like the databases we otherwise deal with and that
means we have to approach ABAC for them somewhat differently.


Concerning using the big data itself for access control decisions, I can’t think of an obvious usecase. XACML normally deals with attributes like group or department which have a single value or a small number of values. I can imagine something like a sensor network (IoT) where you would want to sample the environment and periodically adjust some metric which in turn is used as a policy input. For example, if the number of transactions per second or the number of attacks or the amount of snowfall reaches some threshold, you might want to adjust the access control rules. This would not be done by modifying policy, but including in the policy some reference to the attribute which reflects the changing state.


That attribute is something you would want to periodically compute and store
rather than calculate on demand during authorization requests so as to avoid
expensive distributed transactions in the big data database.

Regards,
Steven

Hal

Follow-Ups:
- RE: [xacml] For Thursday: ABAC and big data
  - From: Hal Lockhart <hal.lockhart@oracle.com>