xacml message

Subject: Re: AW: [xacml] Issue #11 again
From: Erik Rissanen <erik@axiomatics.com>
To: Jan Herrmann <herrmanj@in.tum.de>
Date: Fri, 23 Oct 2009 14:21:02 +0200
Hi Jan,

See inline.

Jan Herrmann wrote:
> Hi Rich,
> see comment inline.
> best regards
> jan
>
>
>  
>   
>> -----Ursprüngliche Nachricht-----
>> Von: Erik Rissanen [mailto:erik@axiomatics.com]
>> Gesendet: Donnerstag, 22. Oktober 2009 15:22
>> An: XACML TC
>> Betreff: [xacml] Issue #11 again
>>
>> All,
>>
>> Jan is correct that there is a potential performance problem with XPath
>> vs regexp. As he has demonstrated, it doesn't have to do with the
>> performance of a regexp implementation vs XPath implementation. It has
>> to do with the manner in which the expressions are used.
>>
>> With the xpath based approach this is what happens (at least in a naïve
>> implementation):
>>
>> 1. The policy contains an xpath expression and uses an xpath matching
>> function.
>>
>> 2. The PDP receives the multiple request with an xpath and expands this
>> into N resources.
>>
>> 3. Each resource is evaluated against the policy. To do this the xpath
>> matching function will evaluate two xpath expressions. The individual
>> resource id, which will select a single node, and the xpath expression
>> from the policy, which will typicall evaluate to multiple nodes, let's
>> say there are R of them. It will then check if the individual node is in
>> among the R nodes.
>>
>>
>> With a regexp matching function this will happen (I use the term "xpath
>> expression equivalent" here since it is still an open question which
>> language we will use here):
>>
>> 1. The policy contains an "xpath expresssion equivalent" and uses a
>> regexp matching function
>>
>> 2. The PDP receives the multiple request with an "xpath equivalent" and
>> expands this into N resources.
>>
>> 3. Each resource is evaluated against the policy. To do this the regpexp
>> matching function will work directly on the expression of the individual
>> resource id.
>>
>>
>> Notice the difference in step three. In the second case there is no
>> requirement that the actual R nodes are enumerated.
>>
>> I don't know what would be possible to do in the first case. A naïve
>> implementation would evaluate the xpath from the policy into a result
>> set of R nodes, and then check whether the individual resource-id node
>> is among these nodes. That would require iterating over up to R nodes,
>> and is likely less efficient than the regexp approach.
>>
>> Now, maybe it would be possible to run the xpath evaluation "backwards"?
>> That is, given a node in an XML tree, can we ask whether a given xpath
>> expression would select that node, without having to evaluate the full
>> set of nodes? I don't know. If it would be possible, then an
>> implementation of the xpath approach can have similar performance as the
>> regexp approach.
>>     
>
> If I understand your idea right, it seems to me that this actually the idea
> behind the reg-expr based approach. generate a name=xpath-expression for a
> node and match this against reg-expr-literals in the rules.
> If there is a difference it would be helpful if you could explain in more
> detail.
>   

No, this is nothing like the regexp based approach. What I mean is that 
an XPath API is typically like this:

matchingNodes = xpathExpr.evaluate(contextNode)

that is, given the xpath expression, the context node the API will 
return a set of matching nodes. With this API, the naïve implementation 
will be like this:

boolean xpath-node-equal(xpath1, xpath2) {
  nodes1 = xspath1.evaluate(contentNode)
  nodes2 = xspath2.evaluate(contentNode)
  if(nodes2.contain(node1))
    return true;
  else
    return false;
}

the problem here is that this requires enumerating the potentially large 
set of nodes "nodes2".

What I meant is that perhaps we could instead have an API which looks 
like this:

boolean xpathExpr.matches(contextNode, anotherNode)

that is, given the xpath expression, the context node and another node, 
the API would return true/false on whether the expression matches 
"anotherNode".

If this operation can be performed efficiently, we could implement the 
xpath-node-equal function efficiently like this:

boolean xpath-node-equal(xpath1, xpath2) {
  nodes1 = xspath1.evaluate(contentNode)
  return xspath2.matches(contentNode, nodes1)
}

note that "nodes1" is a singleton node, which can probably be found 
efficiently (assuming that xpath is implemented efficiently). There 
would be no need to enumerate all the nodes matched by xpath2.

I haven't checked if this is possible.

Best regards,
Erik
References:
- Issue #11 again
  - From: Erik Rissanen <erik@axiomatics.com>
- AW: [xacml] Issue #11 again
  - From: "Jan Herrmann" <herrmanj@in.tum.de>