regrep-query message

Subject: Re: Discussion of getPath() method

From: Farrukh Najmi <Farrukh.Najmi@Sun.COM>
To: regrep-query@lists.oasis-open.org
Date: Tue, 25 Sep 2001 11:20:29 -0400

First, I am sorry for a mistake/typo leading to inconsistency in my previous post. The algorithm proposed in

http://lists.oasis-open.org/archives/regrep-query/200109/msg00047.html

uses name of the scheme and the code of the nodes and therefore the answers would be:

However, I can see your point about using the if of the id of the scheme instead of the name.
In fact I like it even better as it gives a unique begining element for the path.

We also need to discuss what exact subset of XPATH we should support. I see the following bare bones minimum as quite adequate, intuitively obvious to the user and very simple to implement for registry implementors without any XPATH background:

Syntax Technical Description Use Case Description

/urn:org:un:spsc:cs2001/32/11/18 Fully qualified absolute path Find the exact node specified by path

/urn:org:un:spsc:cs2001/*/*/18 '*' wildcard syntax for one or more level Find all level 3 nodes (great grand children) with code 18 of UNSPSC scheme

/urn:org:un:spsc:cs2001/32//18 '//' syntax to match any descendent Find any descendent of first level node with code 32 that has code 18

Note I removed the '|' syntax for boolean logic as filter query already supports this quite well. One question is whether the path elements in XPATH syntax should allow for SQL LIKE based pattern matching. I propose that it does not.

And yes we need other opinions as well.

Finally, about the commercial. Here is are some reasons why we should not support both XPATH and path/pathElement:

-What is the use case for adding path/pathElement that is not addressed adeqately by XPATH

-XPATH is simpler for the client programmer
See concrete examples at http://lists.oasis-open.org/archives/regrep-query/200109/msg00047.html

-It is generally accepted in Software that a good design should avoid having more than one ways of doing the same thing

-If we allowed path as level/value pairs it:
    -makes the API more complex for the programmer. Can we imagine if in unix OS, when creating a file, a user had to create
path and pathElements and compose them together through some API instead of just typing /user/najmi/file.dat ?
    -Introduces abstractions that add clutter and are not solving any unsolved problems
    -It does not free a registry implementation from implementing XPATH support. I hope you are not implying that we should leave it as a choice whether a registry implements XPATH support or path/getPath support? I do not believe we should give non-interoperable choices to registry implementations to implement one or the other. If both are supported and both are required then why have both when one is clearly simpler for the client and the registry impolementor.

Len Gallagher wrote:

We need other opinions here! I'm concerned about pre-pending the "name" or
the "id" to the string one gets from the getPath() method.
In the first example, Farrukh prepends the name. But in the second example
he prepends the id.
There is no guarantee that the "name" of a classification scheme is a
unique identifier for it. In particular UNSPSC could refer to the 2001
version or the upcoming 200x version, or some completely unrelated scheme.
If we must prepend one of these items it must be the "id".
But prepending the "id" is also unsatisfactory to me because the "id" may
be a 128 bit UUID that is completely meaningless to a human. No one would
use such a system that required such id's to be a visible part of a human
readable path.
I'm also a bit concerned that the path for UNSPSC code "321118" would be a
forced splitting up of the code into 32/11/18. However, I can live with
that if there is some other way for a user to ask for all items that are
classified by "321118" or some sub-classification of "321118". It doesn't
work to ask the user to split up the code into its constituent parts
because many users, and many software clients, won't have any idea how to
do that, especially if the classification is an external classification
like the Library example referenced below. Even the U.S./Canada NAICS
classification scheme uses a different number of digits for each level and
most client software systems won't know how many digits to allow for each
level.
Now comes the commercial!!
If our XML syntax for submitting or querying a Classification instance were
extended so that either "code" or "path" (or both!) could be used as part
of the classification, then I think we could avoid the problem raised
above. I think it is possible to modify our existing XML specification for
Classification, in an upward compatible way, to do just that! But first we
need to agree on what the getPath() method returns for known classification
schemes.
-- Len
At 08:34 PM 9/24/01, Farrukh Najmi wrote:
>This is a very good discussion. Please see my responses inline.
>
>Len Gallagher wrote:
>
> > Registry Query team,
> >
> > During last Friday's teleconference we discussed the getPath() method
> > defined for the ClassificationNode class in ebRIM (cf Section 10.2.4
> page 38).
> >
> > At present this method is only superficially specified in ebRIM. I think
> > the confusion we're all having in trying to understand what one another is
> > saying is a direct result of the lack of specification for the getPath()
> > method. Can we have a discussion in the Query team as to what we think
> > should get returned by getPath()?
> >
> > Consider a few examples:
> >
> > 1) ClassificationScheme
> >       id="urn:org:un:spsc:cs2001"
> >       name="UNSPSC"
> >
> >     ClassificationNode
> >       id="UUID1"
> >       name="Electronic Components and Supplies"
> >       code="32"
> >       parent=???
> >
> >     ClassificationNode
> >       id=UUID2"
> >       name="Diodes and transistors and semiconductor devices"
> >       code="11"
> >       parent="UUID1"
> >
> >     ClassificationNode
> >       id=UUID3
> >       name="Integrated circuit components"
> >       code="18"
> >       parent="UUID2"
> >
> >   What string value should getPath() applied to node UUID3 return?
> >
> >    a) "urn:org:un:spsc:cs2001/321118"
> >
> >    b) "urn:org:un:spsc:cs2001/32/11/18"
> >
> >    c) "UNSPSC/32/11/18"
> >
> >    d) "UNSPSC/321118"
> >
> >    e) "321118"
> >
> >    f) "32/11/18"
> >
> >    g) "UNSPSC/Electronic Components and Supplies/Diodes and transistors
> >                and semiconductor devices/Integrated circuit components"
>
>According to the algorithm I described in:
>
>     http://lists.oasis-open.org/archives/regrep-query/200109/msg00047.html
>
>The answer is (c)
>
> >
> >
> > I don't think there's any value in trying to carry along the names for the
> > classification scheme or the names of the various nodes in its hierarchy.
> > There's just too much chance for error. So I think we should concentrate on
> > id's and/or codes. That would eliminate choices c), d) and g). Next, I
> > think we should make a clear distinction between the classification scheme
> > itself and the nodes in its hierarchy. I see no value in a) or b) since we
> > can use separate methods to get at that scheme information. That leaves e)
> > or f). My vote goes for e), because f) would require people to remember
> > how many digits are in each level of the path and sometimes (e.g. NAICS)
> > that varies.
> >
> > CONCLUSION: For multi-level coded classification schemes, i.e.
> > classifications schemes for which each node's "code" is an embedded
> > representation of the path leading to that node, getPath() should return
> > just the "code" for that node.
>
>First I believe that the term multi-level coded classification scheme is a
>misnomer here.
>I beleive you are looking for a term to describe schemes that embed the
>path of a
>node in the nodes
>code.
>
>Not sure why you say it is wrong to carry the name of the scheme in the
>path. I
>agree we should not
>carry the name of the node in the path.
>
>I believe that our spec should be blind about any meaning implied in the
>code for
>a scheme and simply follow the
>algorithm I described.
>
> >
> >
> > 2) ClassificationScheme
> >       id="urn:ebxml:trees:v1"
> >       name="Modern Day Tree Types"
> >       description="This scheme defines the Genus and Species of modern day
> > trees"
> >
> >     ClassificationNode
> >       id="UUID4"
> >       name="Acer"
> >       code="Acer"
> >       parent=???
> >       description="<enUS> Genus name for any maple tree"
> >
> >     ClassificationNode
> >       id=UUID5"
> >       name="barbatum"
> >       code="barbatum"
> >       parent="UUID4"
> >       description="<enUS> Species name for Southern Sugar Maple"
> >
> >   What string value should getPath() applied to node UUID5 return?
> >
> >    a) "Modern Day Tree Types/Acer/barbatum"
> >
> >    b) "urn:ebxml:trees:v1/Acer/barbatum"
> >
> >    c) "Acer/barbatum"
> >
> >    d) "Genus:Acer/Species:barbatum"
> >
> > For the same reasons as above, I think we should rule out a) and b). I
> > don't like d) so much because I don't think we should mix level names with
> > the path leading to a node. Instead, if level names are important, we
> > should extend our model to allow the user to define level names, with a new
> > method on ClassificationScheme to getClassificationLevels() and a new
> > method on ClassificationNode to getLevelName(). I think c) is the proper
> > result for getPath().
> >
> > CONCLUSION: For a general purpose multi-level classification scheme, where
> > it is not known whether or not the code attribute for ClassificationNode
> > carries an embedded path representation, getPath() should return a sequence
> > of codes from the first to last levels of the classification scheme. Each
> > code should be separated from the others by a "/".
>
>I believe the path must include the scheme name in order for it to be
>absolute.
>This is similar to
>how the root directory plays a role in file paths in a file system.
>
>So according to the algorithm I proposed the correct answre would be (b)
>
> >
> >
> > c) Any 1-level Enumeration Classification Scheme
> >
> > CONCLUSION: For any node N in a 1-level classification scheme, e.g. all
> > enumeration domains, the getPath() method should return a value equal to
> > the "code" attribute for that node.
>
>Again the scheme name must be part of the path so it would be:
>
>/schemeName/codeAttributeForNode
>
>according to the proposed algorithm
>
> >
> >
> > d) Consider the Library Classification Scheme discussed in a previous email
> > message.
> >
> >
> http://lists.oasis-open.org/archives/regrep-ex-scheme/200109/msg00004.html
> >
> > This example defines a multi-level external classification scheme.
> >
> > CONCLUSION: For external classifications, the submitter of the
> > classification should be allowed to provide as much information as possible
> > to help the Registry determine what is the intended "code" and "path" and
> > "pathDepth" of each node referenced by the Classification instance. For
> > example, the submitter should be allowed to say that the path for the
> > classification of a book is "TA357.5", since that is the preferred
> > embedding for the entire path of the node. But the submitter should also be
> > allowed to submit a pathDepth value of 3, or a pathRepresentation like
> > "TA/357/5", so that the Registry can support queries over the separate
> levels.
>
>External classifications are the only kind of classification that UDDI
>does. As
>such UDDI has honed it down
>reasonably well. If you study what UDDI does in this area they have no
>notion of
>pathDepth etc. All
>they have is the notion of keyed reference which is a tuple consisting of a
>scheme, keyName and a keyValue.
>
>This is exactly what I have proposed in the external classification
>proposal that
>is being considered within that sub-team.
>
>I am not convinced of any use case that pathDepth can address that are not
>addressed by the examples I gave
>in:
>
>     http://lists.oasis-open.org/archives/regrep-query/200109/msg00047.html
>
>
> >
> >
> > Any other opinions?
> >
> > -- Len
> >
> > **************************************************************
> > Len Gallagher                             LGallagher@nist.gov
> > NIST                                      Work: 301-975-3251
> > Bldg 820 Room 562                        Home: 301-424-1928
> > Gaithersburg, MD 20899-8970 USA           Fax: 301-948-6213
> > **************************************************************
> >
> > ----------------------------------------------------------------
> > To subscribe or unsubscribe from this elist use the subscription
> > manager: <http://lists.oasis-open.org/ob/adm.pl>
>
>--
>Regards,
>Farrukh
>
**************************************************************
Len Gallagher                             LGallagher@nist.gov
NIST                                      Work: 301-975-3251
Bldg 820 Room 562                        Home: 301-424-1928
Gaithersburg, MD 20899-8970 USA           Fax: 301-948-6213
**************************************************************

--
Regards,
Farrukh

begin:vcard 
n:Najmi;Farrukh
tel;work:781-442-0703
x-mozilla-html:FALSE
url:www.sun.com
org:Sun Microsystems;Java Software
adr:;;1 Network Dr. MS BUR02-302;Burlington;MA;01803-0902;USA
version:2.1
email;internet:najmi@east.sun.com
fn:Farrukh Najmi
end:vcard

Follow-Ups:
- RE: Discussion of getPath() method
  - From: Matthew MacKenzie <matt@xmlglobal.com>

References:
- Discussion of getPath() method
  - From: Len Gallagher <LGallagher@nist.gov>
- Re: Discussion of getPath() method
  - From: Len Gallagher <LGallagher@nist.gov>

Syntax	Technical Description	Use Case Description
/urn:org:un:spsc:cs2001/32/11/18	Fully qualified absolute path	Find the exact node specified by path
/urn:org:un:spsc:cs2001///18	'*' wildcard syntax for one or more level	Find all level 3 nodes (great grand children) with code 18 of UNSPSC scheme
/urn:org:un:spsc:cs2001/32//18	'//' syntax to match any descendent	Find any descendent of first level node with code 32 that has code 18