xacml message

Subject: Example of dag and forest used to manage collection of resourcesfor comparison

From: "Rich.Levinson" <rich.levinson@oracle.com>
To: xacml <xacml@lists.oasis-open.org>
Date: Thu, 26 Feb 2009 14:21:18 -0500

Hi All,

At this week's tc meeting, when we discussed the proposed changes to the Hierarchical Resource Profile,
http://lists.oasis-open.org/archives/xacml/200902/msg00056.html
I was asked to provide an example distinguishing the dag and forest so people could easier understand the core of the issue and impact on things in general.

It consists of:

some assumptions about a real world situation within which we will use to frame the concepts
a problem statement
a manual data collection process during which multiple sources of authority define hierarchies associated with a common set of resources
a hypothetical "resource manager program", which can manage the data as a dag or a forest and we will consider the implications
some concluding remarks

Assumptions:

There is a flat collection of resources, which can be anything, records in a database, people, or for the sake of concreteness, let's consider a bunch of large physical metal containers in a storage area or warehouse near a shipping port. The problem to consider is the management of these containers and the customers that use them. Each container has a large painted alphanumeric identifier on it, such as "AT12345XY", with no other properties except there are no duplicates. The containers are very large and typically can be used to provide storage for more than one customer.
We can draw a picture of these containers as a collection of boxes on a piece of paper, the point being, that we are going to model the management of these containers as a forest and a dag and use this drawing to conceptualize the problem and the forest and dag solutions.
Let us assume that there are two interested customers in these containers, and that they each have their own uses for the containers and have their own labels for the containers and that for what ever reason, they each choose to organize the containers as a single rooted hierarchy.

Problem statement:

Design a "resource manager" for these containers and the 2 customers so that the hierarchy each customer specifies is used to initially establish the hierarchical relationship each customer has requested among the containers the customer chooses to use.

Manual data collection process to follow:

As Hal suggested at the TC meeting, let's assume we have all the containers drawn as boxes on a piece of paper, which shows all the boxes, each with a serial number in the upper left corner of an otherwise blank box.
Now each customer will have their own sheet and in their own separate color draw their own hierarchy on the sheet, which will show a top node and then children nodes recursively, for whatever boxes they choose to be in their collection. In addition to showing the hierarchical lines connecting their boxes, each customer may pick their own "name" for each box, or they can leave it blank and we will use the serial number instead when the data is entered.The only requirement on the customer-specific names is that they be unique on the sheet. It does not matter what any other customer defines on another sheet. If the customer is lazy or simply does not want to think up names, that is ok, because we will be able to use the serial numbers instead.
When the 2 customers are done they will hand in their sheets to the "guy" who runs the warehouse and he will enter the data from the sheets into the resource manager program on his laptop.

The "hypothetical" resource manager program:

Assume the program is a little database, and that an initial database has been set up that has one row for each container, and it has a primary key with serial number of container in column zero.
Collect information for both the dag:and the forest:
Enter the data for one sheet at a time, i.e. for each sheet, i:

find the top of the customer hierarchy (from the customer drawn diagram, pick the serial number in the box at the top of the customer drawn hierarchy) and
enter the customer specified name as the 2i-1st column value after the box serial number of the row with identifier equal to serial number of box on sheet of paper, for which, by definition, exists in the initial "empty" database (a list of all the available boxes) (if no customer-specified name supplied, put a copy of the row's serial number in instead)
for each child (iteratively until all entries processed):

enter the customer specified name as the 2i-1st column value after the serial number of the row with identifier equal to serial number of box on sheet of paper (if no customer-specified name supplied, put a copy of the row's serial number in instead)
enter the name of the parent node as the 2ith column value in the same row

When step 3 is complete, we should have all customer 1's boxes with their customer-selected id in columns 1 and the parent customer-selected name in column 2. And for customer 2's boxes we will have the same information in columns 3 and 4.
This process can be repeated indefinitely for any number of customers, and each customer can choose any subset of boxes with any specific box as the top of the customer specific hierarchy, recognizable by the fact that it is the only entry that does not have a parent value in the 2ith column.
At this point the situation is set up completely so the forest resource manager or the dag resource manager can begin operation.

Concluding remarks:

So, what is the difference between the forest and dag?

The following explanation is my interpretation of the data structures of forest and dag, and my analysis of the comparison to follow may not be correct, in which case I am more than happy to have someone explain what would be correct. However, if it is correct, I think it will clearly demonstrate why I have considered this issue to have the "severity" I have attributed to it (and why there has been this whole sequence of emails and proposed revs to the spec).

The difference, as I see it, is:

For the forest, the job is done. The forest resource manager can add new customers, remove customers by clearing out their 2 columns and freeing them up for a new customer.
For the dag, the there is an "optimization" that can now be made. The dag resource manager notices that there are many empty pairs of cells in the table representing boxes that the current customer in each column is not using. Therefore this data can be "compressed" so that all rows only use the number of columns times two for which a customer actually has a box in use.

As is well known, for every optimization there is usually a cost. The cost of this optimization,as described, and if accurate representation of info missing from dag, is that the ability to define a specific customer's set of boxes could previously be done just by finding the customer's named root node, then all the non-blank boxes in that column belonged to the customer. With this "optimization" the association of customer to column is destroyed and that information is either no longer available or has to be obtained by other means.

Bottom line: again as I see it, this is the problem with the person who was the xacml-commenter referenced in earlier emails who was seeking advice, presumbably guided by the spec in its current form to dismantle their URIs. To me, this appears as if they are basically destroying information that they had already established as a sunk cost, and constraining themselves to work within the more limited framework that the lesser information provides.

Comments welcome.

Thanks,
Rich

Follow-Ups:
- Re: [xacml] Example of dag and forest used to manage collection of resources for comparison
  - From: Daniel Engovatov <daniel@streamdynamics.com>