regrep message

Subject: RE: re the simple case
From: "Kearns, Una" <una.kearns@documentum.com>
To: regrep@lists.oasis-open.org
Date: Fri, 24 Mar 2000 18:38:52 +0100
Hi,

We are getting a good discussion going. 

I actually believe we are all in agreement to a certain extent.

Below is an extract from the FAQs of ISO 11179:

"C.4  Why does this part of ISO/IEC 11179 not address the registration of
other objects, such as Object
Classes and Data Element Concepts?

Although the above objects may need to be registered for certain purposes,
the scope of ISO/IEC 11179
is limited, at this point, to Data Elements."

As I said in my original email -- I do think that there are good concepts in
ISO 11179 for registering and identifying "items/data-elements,etc.." but it
is not FULLY applicable to registering Schemas, etc... 
I do believe that much of the metadata is relevant but I do not think we are
compliant to ISO 11179 -- as they even state themselves they are only
concerned with registering data elements. 

We have diverged from ISO 11179 by having registration of
Data-element-dictionaries, sets.. etc... 

What I would like us to do is to have the notion of a "registered-item" ---
this could be the same as a data-element but for our purposes we have just
extended the notion of what a data-element is, i.e. could be a DTD, Schema,
whatever we want to register and then the relationships between
data-elements - and I think this maps to what Jon is saying.   Lets get rid
off registering data-element-dictionaries, sets, , and data-elements etc...
and keep to registering one thing which relates to other things -- I think
it will make it much easier to comprehend and then just have packages for
submitting, updating registered-items.  So lets say in the future we have
everybody doing wonderful analysis and registering individual elements and
attributes they could then also register the relationship of "containment"
to represent a given tree-structure.   Looking at the NIST design they also
have a notion of one registered item.

We would have to be very careful how we identify types of relationships and
order within those relationships to properly model a tree structure and this
is not really specified to any level of detail in ISO 11179 - but I think
this is probably a future rev of the spec.  ( We would need a relationship
type of containment and the order number specified). I also think we would
probably need to extend the metadata to capture fully registered elements
and attributes.  We would also have to be very careful is specifying rules
for updating registered-items that are parents of a containement
relationship as really adding a child data-element modifies the structure of
a parent data-element.

I must say one thing that keeps confusing me is the difference between --
data-element-association and related-data-reference.

data-element-association --- I get is specified as a relationship that
exists between two data-elements (containment would be an example).

related-data-reference --- is defined as a reference between the data
element and any related data  i.e. documentation etc... 

The NIST design describes it as I originally interpreted it:
ASSOCIATION being relationship between data-elements (i.e. strong
association) -- contained in etc....

While related_data_items they define as identification of non-registered
items related to a registered element.  In most cases such references will
be URLS or URNS that identify reference manuals, white-papers, example sets,
style-sheets, etc..

Now this is what I feel we have been having the debate over --- if these
examples need to be registered items which came out in our last call ---
i.e. I cannot not have any relationship except to registered items, so all
references must be registered in the repository even if they are to external
references.    Then if this is the case then I am wondering do we need two
relationship types --- maybe we need one relationship element that defines
different types of relationships. 

 

Thanks,

Una
-----Original Message-----
From: Terry Allen [ mailto:tallen@sonic.net <mailto:tallen@sonic.net> ]
Sent: Thursday, March 23, 2000 3:39 PM
To: regrep@lists.oasis-open.org
Subject: Re: re the simple case


Jon wrote:
| Terry has already responded to points raised by Una Kearns in her
| message of 2000.03.20.  Here I will assay my own response, partly
| in an attempt to engage in a meaningful conversation on this
| subject but mostly, I will confess, to see whether I have
| understood anything at all from my first reading of 11179.
|
| I haven't yet gone over the regrep spec itself yet, so nothing in
| what I have to say at this point relates in the slightest to the
| regrep DTD and its connection to 11179, only to 11179 itself.
|
| The part of Una's message that interests me most is this:
|
| | I have spent quite a few hours reading and re-reading ISO 11179
| | and also the regrep spec.
| |
| | Firstly -- I feel we are trying to pigeon-hole what we are trying
| | to accomplish against a specification that had a different purpose
| | in mind.  From reading the EBXML mailing list -- I feel that
| | there are people there of the same opinion.
| |
| | There are definitely common good concepts in ISO 11179 that apply
| | to registration and metadata that apply just generally to basic
| | modelling of information, but there is no-way we can say we are
| | 100% conformant to ISO 11179 and I don't think we would want to be
| | or nobody would use the XML.org Registry and Repository.
| |
| | The basic reason why is:
| |
| | ISO 11179 is about registering and storing data-elements ( and
| | thats it).  A data element is as they state themselves an
| | undivisable unit of data examples they give themselves would be:
| | country of origin code, employee number, an employee lastname,
| | product description, etc...
|
| My initial impression of 11179 was largely the same as Una's --
| that this specification was designed for the registration of
| labeled atomic data items like part numbers and units.  On what I
| think is a closer reading I now believe that this impression is
| only partly true.  What I now believe is that 11179 comprehends a
| much deeper notion of what can constitute a data element but that
| its authors were concerned mainly with the application of the
| standard to relational databases and that this bias so strongly
| permeates the examples that one gains a misleading impression of
| what a data element can be.

Yes.  In fact if you extend the present OASIS spec and allow
composite data elements (which Jon gets to below) you have something
like an XML Schema without some of the bells and whistles.

| Una says
|
| | A data element is as they state themselves an undivisable unit
| | of data
|
| but that's not quite right.  What 11179-1 says is that a data
| element is
|
|    A unit of data for which the definition, identification,
|    representation, and permissible values are specified by means
|    of a set of attributes.
|
| As far as I can tell, this says that a data element is any piece
| of data about which metadata can be asserted, or more simply yet,
| any piece of data to which a unique identifier can be assigned;
| that would be just about anything.

Not really; the point is that a data element must have a
definition (data element concept), an identifier, and a specific
representation (not all date elements have permissible values).

Part 3 says,
3.4  data element concept : A concept which can be represented in the form
of a data element, described independently of any particular representation.

"Length" is a data element concept; it can be represented as
inches, centimeters, and so on.  Only when a data element
concept is conjoined with a representation does it become
a data element (which may have different names in different
contexts).  From my notes on the 1999 Open Forum:

"ISO 11179 Past, Present, Future - a Thumbnail Sketch," by
Bruce Bargmeyer, EPA.  Distinguishes real world (an apple),
data (a sticker on the apple with a number on it) and
metadata (information about that number).  A data element
*concept* such as "U.S. State Identifier" could be a set of full names
(Alaska, California), abbreviations (AK, CA), or numeric codes
(01, 12).

That's a data element that has permissible values.

| More helpfully (and I feel sure this is the passage that Una is
| referring to), Informative Annex A of Part 1 (Section A.1.2.1)
| says:
|
|    A data element then is a single unit of data that in a certain
|    context is considered indivisible.  It is a unit of data
|    representing a single fact about a type of object (object
|    class) in the natural world.

The passage in full:

Bytes and bits are also components of data.  Although they may be
used to record data elements in an electronic medium, they do not
correspond to data elements.  In a database, a data element may be
implemented as a field or column.  In ChenÆs ER data model, it is an
attribute (see Figure B-5).  A data element then is a single unit of
data that in a certain context is considered indivisible.  It is a
unit of data representing a single fact about a type of object
(object class) in the natural world. (For example, a one character
code with allowed values of "M" or "S" representing the marital status
attribute of an "employee" object class.)  It cannot be decomposed
into more fundamental segments of information that have useful meanings
within the scope of its application.  Data elements are thus defined
as relevant to the user within the user's universe of discourse. 
Data elements are electronic or written representations of the
properties of natural-world object classes.

| Here the operative word for me is "context."  Consider a purchase
| order.  From the RDBMS viewpoint that dominates most of 11179,
| this is indeed a collection of individual data elements, each of
| which an organization might well want to register individually.
| But in some processing contexts, the purchase order itself is the
| indivisible unit, and we might with equal justification wish to
| register that.  The DTD for a purchase order certainly represents
| "a single fact about a type of object ... in the natural world,"
| namely the grammar of a purchase order.  (An objection that a DTD
| is not a single fact would, in my opinion, reflect an unexamined
| notion of what a fact is.  Surely the characters composing the
| string that expresses a name in a database are themselves "facts"
| about that name, and the individual digits of the Unicode
| representation of each character are "facts" about that
| character, and so on.)

And we could represent a DTD as a composite data element or as
a data element dictionary that contains composite data elements.
For mere registration of DTDs in whole, we don't want to unpack
what's inside the DTD, but as we might later, I've assimilated
DTD to data element dictionary.

| I think that the question of what constitutes a data element boils
| down to how much structure it can have.  The impression one gets
| from most of 11179 is that data elements are atomic (or to be more
| accurate, that data elements are the structures that live just
| above the level of individual characters).  But 11179 is
| quite clear that data elements can contain substructures.  From
| Informative Annex A of Part 1 again:
|
|    Sometimes data elements are derived from several constituent
|    parts, where each of the parts are represented as data
|    elements.  These derivations can be of many forms.  An example
|    is concatenation for the formation of a telephone number from
|    its constituent parts.  In the U.S., telephone numbers are
|    uniquely described with ten digits, and these numbers can
|    easily be represented by a data element.  However, the
|    telephone companies (and others) need the telephone number
|    separated into area code, exchange code, and line number,
|    making three data elements.  Concatenating the area code,
|    exchange code, and line number (in the right way) allows the
|    formation of a data element representing the full telephone
|    number.
|
| The section following this passage begins "Data elements in
| relational databases appear as field labels in tables," and from
| that point on one encounters nothing but examples that reinforce a
| view of data elements as atomic data, but the fact remains that
| the standard (or at least the informative part) recognizes that
| this is not necessarily true of all data elements.  This is
| brought out even more clearly in Informative Annex A of Part 3
| (Section A.3.1):
|
|    In a data management environment it may be required to control
|    the relation between 'composite data elements' and the
|    'component data elements' which form the 'composite data
|    element'.
|
|    Example:
|
|    The composite data element: 'address' may be composed of the
|    component data elements: 'name of addressee', 'street name',
|    'street number', 'city name', 'postal code', 'country name'.
|
| So while I agree with Una that 11179 is not particularly friendly
| to tree-structured information elements, it doesn't rule them out,
| and I see nothing in 11179 that would prevent its use in
| registering DTDs and other complex data structures.
|
| While this is the kind of simplification that is almost guaranteed
| to get me in trouble, I will go even further and say that as far
| as I can tell, "11179 data element (type)" and "XML element
| (type)" can be considered, if not identical, at least isomorphic
| concepts in the sense that anything that can be an XML element
| type can be registered as a 11179 data element type.  (The same is
| not true, I hasten to add, of "11179 attribute" and "XML
| attribute.")

Yes.  And XML attributes can be considered as 11179
data elements (and when they have specified values, they
can be considered as data elements with sets of permissible values).

| What seems to me to be missing from 11179 is the concept of a
| hierarchy of data elements, and if this is what's bothering Una,
| then I admit to sharing the same concern.  It appears that if I
| want to register a purchase order DTD and also register the
| transaction number in that DTD, I will have to register two
| separate items that are, as far as the registry is concerned, at
| the same level logically and functionally.  While workable, this
| doesn't map well to an XML way of thinking about data structures.
| However, it's hard to see just now what would.

If you register the DTD as a data element dictionary, then you
can make the data element dictionary contain the data element
for transaction number.  It is certainly true that 11179 doesn't
go far in contemplating a registry that is more than one big
data element dictionary, but that doesn't seem a serious problem.

| Perhaps the best way to look at this is that 11179 provides a way
| to register both schemas at the top level and individual "fields"
| at the bottom level and that we must look to schemas themselves to
| provide a way to describe the relation between top and bottom.  I
| believe that this is what Terry meant when he wrote
|
| | Eventually we may want to be able to stitch together 11179 and
| | some revision of XML Schema so as to give continuity from top to
| | bottom

Yes.  There isn't any immediate point in describing transaction
number as a 11179 data element if we already have it described
in an XML Schema/DTD representation (which is what we want to use
directly anyway) inside the schema/DTD.  So it would be nice
to be able to shift gears.

However, there is a nonimmediate point in so describing transaction
number:  doing so enables harmonization of data elements that
share the same data element concept but have different
representations (and different names) in different contexts.  
That's what the people who have data sets on their hands are
excited about, and it's something EBXML probably needs.

| In the meantime, I don't see any basic problem with telling
| submitters of the repository to register whole DTDs (or, as in the
| case of DocBook, DTD modules) and to put definitions of individual
| elements into the associated documentation.  The fact that
| submitters have the option of also registering individual
| low-level XML elements (i.e., element types) or even individual
| XML attributes (i.e., attribute names) doesn't strike me as a big
| problem; on the contrary, we can view the job of developing
| standard DTDs over the next few years as one of abstracting out
| the common substructures of various standard DTDs and registering
| them separately so that they can constitute the common components
| from which further DTDs will be constructed.

I think so too.  Thanks, Jon.

regards, Terry