At this week's tc meeting, when we discussed the proposed changes to
the Hierarchical Resource Profile,
I was asked to provide an example distinguishing the dag and forest so
people could easier understand the core of the issue and impact on
things in general.
It consists of:
- some assumptions about a real world situation within which we
will use to frame the concepts
- a problem statement
- a manual data collection process during which multiple sources of
authority define hierarchies associated with a common set of resources
- a hypothetical "resource manager program", which can manage the
data as a dag or a forest and we will consider the implications
- some concluding remarks
- There is a flat collection of resources, which can be anything,
records in a database, people, or for the sake of concreteness, let's
consider a bunch of large physical metal containers in a storage area
or warehouse near a shipping port. The problem to consider is the
management of these containers and the customers that use them. Each
container has a large painted alphanumeric identifier on it, such as
"AT12345XY", with no other properties except there are no duplicates.
The containers are very large and typically can be used to provide
storage for more than one customer.
- We can draw a picture of these containers as a collection of
boxes on a piece of paper, the point being, that we are going to model
the management of these containers as a forest and a dag and use this
drawing to conceptualize the problem and the forest and dag solutions.
- Let us assume that there are two interested customers in these
containers, and that they each have their own uses for the containers
and have their own labels for the containers and that for what ever
reason, they each choose to organize the containers as a single rooted
Manual data collection process to follow:
- Design a "resource manager" for these containers and the 2
customers so that the hierarchy each customer specifies is used to
initially establish the hierarchical relationship each customer has
requested among the containers the customer chooses to use.
The "hypothetical" resource manager program:
- As Hal suggested at the TC meeting, let's assume we have all the
containers drawn as boxes on a piece of paper, which shows all the
boxes, each with a serial number in the upper left corner of an
otherwise blank box.
- Now each customer will have their own sheet and in their own
separate color draw their own hierarchy on the sheet, which will show a
top node and then children nodes recursively, for whatever boxes they
choose to be in their collection. In addition to showing the
hierarchical lines connecting their boxes, each customer may pick their
own "name" for each box, or they can leave it blank and we will use the
serial number instead when the data is entered.The only requirement on
the customer-specific names is that they be unique on the sheet. It
does not matter what any other customer defines on another sheet. If
the customer is lazy or simply does not want to think up names, that is
ok, because we will be able to use the serial numbers instead.
- When the 2 customers are done they will hand in their sheets to
the "guy" who runs the warehouse and he will enter the data from the
sheets into the resource manager program on his laptop.
- Assume the program is a little database, and that an initial
database has been set up that has one row for each container, and it
has a primary key with serial number of container in column zero.
- Collect information for both the dag:and the forest:
- Enter the data for one sheet at a time, i.e. for each sheet, i:
When step 3 is complete, we should have all customer 1's boxes
with their customer-selected id in columns 1 and the parent
customer-selected name in column 2. And for customer 2's boxes we will
have the same information in columns 3 and 4.
This process can be repeated indefinitely for any number of
customers, and each customer can choose any subset of boxes with any
specific box as the top of the customer specific hierarchy,
recognizable by the fact that it is the only entry that does not have a
parent value in the 2ith column.
At this point the situation is set up completely so the forest
resource manager or the dag resource manager can begin operation.
- find the top of the customer hierarchy (from the
customer drawn diagram, pick the serial number in the box at the top of
the customer drawn hierarchy) and
- enter the customer specified name as the 2i-1st column value
after the box serial number of the row with identifier equal to serial
number of box on sheet of paper, for which, by definition, exists in
the initial "empty" database (a list of all the available boxes) (if no
customer-specified name supplied, put a copy of the row's serial number
- for each child (iteratively until all
- enter the customer specified name as the 2i-1st column
value after the
serial number of the row with identifier equal to serial number of box
on sheet of paper (if no customer-specified name supplied, put a copy
of the row's serial number in instead)
- enter the name of the parent node as the 2ith column value
in the same row
So, what is the difference between the forest and dag?
The following explanation is my interpretation of the data structures
of forest and dag, and my analysis of the comparison to follow may not
be correct, in which case I am more than happy to have someone explain
what would be correct. However, if it is correct, I think it will
clearly demonstrate why I have considered this issue to have the
"severity" I have attributed to it (and why there has been this whole
sequence of emails and proposed revs to the spec).
The difference, as I see it, is:
As is well known, for every optimization there is usually a cost. The
cost of this optimization,as described, and if accurate representation
of info missing from dag, is that the ability to define a specific
customer's set of boxes could previously be done just by finding the
customer's named root node, then all the non-blank boxes in that column
belonged to the customer. With this "optimization" the association of
customer to column is destroyed and that information is either no
longer available or has to be obtained by other means.
- For the forest, the job is done. The forest resource
manager can add new customers, remove customers by clearing out their 2
columns and freeing them up for a new customer.
- For the dag, the there is an "optimization" that can now
be made. The dag resource manager notices that there are many empty
pairs of cells in the table representing boxes that the current
customer in each column is not using. Therefore this data can be
"compressed" so that all rows only use the number of columns times
two for which a customer actually has a box in use.
Bottom line: again as I see it, this is the problem with the person who
was the xacml-commenter referenced in earlier emails who was seeking
advice, presumbably guided by the spec in its current form to dismantle
their URIs. To me, this appears as if they are basically destroying
information that they had already established as a sunk cost, and
constraining themselves to work within the more limited framework that
the lesser information provides.