xacml message

Subject: Re: [xacml] Issue 11 and running XPath in reverse

From: "Rich.Levinson" <rich.levinson@oracle.com>
To: Erik Rissanen <erik@axiomatics.com>, XACML TC <xacml@lists.oasis-open.org>
Date: Thu, 29 Oct 2009 01:27:31 -0400

Hi All,

I have reviewed most of the threads and will reply here since Erik has responded to most of the issues that have been raised in context of issue 11.

Also I will refer to the following email as probably the easiest point at which to connect to the points that have been made wrt the URI-reference proposal:
http://lists.oasis-open.org/archives/xacml/200910/msg00058.html
and the following email contains the actual proposal:
http://lists.oasis-open.org/archives/xacml/200910/msg00030.html

Rather than replying inline, I will just make the following points intended to address the discussion:

URI is a syntax, not a language. In particular, it is implicitly a hierarchical syntax, which is already in use in the Hierarchical Profile in section 2.2.

The proposal for xml documents in this context is section 2.2.1, which proposes using the fragment identifier portion of the URI syntax to contain a representation of the XML hierarchy (which is defined as an unambiguous hierarchy by the XPath 2.0 Data Model).

The representation in the fragment identifier is simply the hierarchical sequence of nodes that one obtains by walking the XPath 2.0 Data Model hierarchy and resolving the namespaces along the way and naming each node with a single string as this sequence of identifiers which are a string representation of the QName associated with each node separated by the "/" character as prescribed in section 3.5 of RFC 3986 "Uniform Resource Identifier (URI): Generic Syntax".

RegExp is already supported in XACML and it is obvious how to apply it to hierarchical syntax, as anyone who has represented files in hierarchical file systems using wildcards would already be familiar.
With URI syntax, there is no need for a notion of running anything in reverse. The URI is a hierarchical path and the URI of any requested node is matched by policy URIs that are left-aligned substrings of the requested URI.

The simplicity of URI syntax derives from the fact that the whole exact hierarchical path to the requested node is contained within the URI syntax itself. There is no need to look anywhere else besides the URI of the requested node and the URIs of the policies that implicitly contain the scope that instantly determines whether the requested node URI is within the scope of the policy URI.
In addition, the proposed URI string syntax is directly transformable to XPath QName syntax

The URI-reference proposal simply applies the well known and understood hierarchical node name matching techniques to the node names of the xml hierarchy.

The only "catch" is that the xml node names are "QNames" which are 2 part names. In order to bring the QName into the URI syntax domain, one must resolve the 2 parts of the QName, which are 2 strings into a single string. The only minor complexity to do this seemingly trivial operation is to specify how to abut the two strings using the allowed characters. One solution to this problem is to remove the ":" separating the two parts and instead surround the first part with curly braces: "{", "}". This is known as "Clark notation": http://www.jclark.com/xml/xmlns.htm

The proposal is NOT proposing the use of URI "INSTEAD OF" XPATH. As is the case in the existing spec, the URI syntax will continue to exist in addition to the XPath syntax.

These are two representations of the same underlying hierarchical resource model. The specs try to make clear that the "representations" are not normative. They should be thought of as two "ways" to identify the nodes in the hierarchy. The "ways" are provided for reference, are not intended to exclude other "ways".
The URI-reference proposal simply explains how URI can be used to refer to XML nodes. i.e. a hierarchical syntax applied to a specificly represented form of a hierarchy.

Thanks,
Rich

Erik Rissanen wrote:

4AE57069.7030300@axiomatics.com" type="cite">All,

I have done some searching and thinking about issue 11 from the public review issues list.

In particular, I have been looking into the possibility of evaluating xpath in "reverse". See my earlier email here for what I mean by that:

http://lists.oasis-open.org/archives/xacml/200910/msg00092.html

Most XPath APIs I can find do not implement any method like this. dom4j does have something similar, with this method:

http://www.oschina.net/uploads/doc/dom4j-1.6.1/org/dom4j/Node.html#matches(java.lang.String)

but when I looked at the source code for dom4j, it is implemented as an enumeration of all matching nodes, so it is not of any help.

I looked at the XPath 1.0 spec myself, and by a casual look it appears to me it would be possible to run it in reverse, starting from the last location step, checking the predicate of that, and the moving along the axis in the reverse direction. Repeat this until the expression is consumed and check whether you got to the context node. For simple expressions this should be efficient, while expressions which uses an axis like "descendants" for instance, could require lots of searching in the document, and become inefficient.

I haven't looked into any of the details, so there might be expressions which cannot be reversed.

Besides running xpath in reverse, another possibility is to cache xpath evaluation results. This would not help in the case of a single request, but would make a big difference in the case of multiple requests (which I think has been the main concern). Consider this process:

1. The policy contains an xpath expression and uses an xpath matching
function.

2. The PDP receives the multiple request with an xpath and expands this
into N resources.

3. The first resource is evaluated against the policy. To do this the xpath
matching function will evaluate two xpath expressions. The individual
resource id, which will select a single node, and the xpath expression
from the policy, which will typicall evaluate to multiple nodes, let's
say there are R of them. It will then check if the individual node is in
among the R nodes. The PDP saves the result of this xpath expression for the duration of the multiple requests.

4. The remaining resources are evaluated against the policy. The xpath expression in the policy does not need to be evaluated since the result is saved from the first evaluation. The single resource xpath is evaluated and the PDP checks whether the node is among the R nodes in the saved result. This can be done very efficiently using a hash table.

With this procedure multiple requests can be evaluated very efficiently. Given this, I propose that we do not invent new schemes or use regexp matching instead of the XACML xpath functions. The XPath functions have many benefits in that they are namespace aware, easy to use (avoiding "matching on a matching language"), the XPath spec is out there available for us to reference without any need for work on our part and XPath implementations are readily available.

There is also a possibility for innovative XACML products go beyond the off-the-shelf XPath implementations and run some of the expressions in reverse if the PDP think it would improve performance.

Best regards,
Erik

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php

References:
- Issue 11 and running XPath in reverse
  - From: Erik Rissanen <erik@axiomatics.com>