OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

acxo message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: The Simple Case


Terry has already responded to points raised by Una Kearns in her
message of 2000.03.20.  Here I will assay my own response, partly
in an attempt to engage in a meaningful conversation on this
subject but mostly, I will confess, to see whether I have
understood anything at all from my first reading of 11179.

I haven't yet gone over the regrep spec itself yet, so nothing in
what I have to say at this point relates in the slightest to the
regrep DTD and its connection to 11179, only to 11179 itself.

The part of Una's message that interests me most is this:

| I have spent quite a few hours reading and re-reading ISO 11179
| and also the regrep spec.
| 
| Firstly -- I feel we are trying to pigeon-hole what we are trying
| to accomplish against a specification that had a different purpose
| in mind.  From reading the EBXML mailing list -- I feel that
| there are people there of the same opinion.
| 
| There are definitely common good concepts in ISO 11179 that apply
| to registration and metadata that apply just generally to basic
| modelling of information, but there is no-way we can say we are
| 100% conformant to ISO 11179 and I don't think we would want to be
| or nobody would use the XML.org Registry and Repository.
| 
| The basic reason why is:
| 
| ISO 11179 is about registering and storing data-elements ( and
| thats it).  A data element is as they state themselves an
| undivisable unit of data examples they give themselves would be:
| country of origin code, employee number, an employee lastname,
| product description, etc...

My initial impression of 11179 was largely the same as Una's --
that this specification was designed for the registration of
labeled atomic data items like part numbers and units.  On what I
think is a closer reading I now believe that this impression is
only partly true.  What I now believe is that 11179 comprehends a
much deeper notion of what can constitute a data element but that
its authors were concerned mainly with the application of the
standard to relational databases and that this bias so strongly
permeates the examples that one gains a misleading impression of
what a data element can be.

Una says

| A data element is as they state themselves an undivisable unit
| of data

but that's not quite right.  What 11179-1 says is that a data
element is

   A unit of data for which the definition, identification,
   representation, and permissible values are specified by means
   of a set of attributes.

As far as I can tell, this says that a data element is any piece
of data about which metadata can be asserted, or more simply yet,
any piece of data to which a unique identifier can be assigned;
that would be just about anything.

More helpfully (and I feel sure this is the passage that Una is
referring to), Informative Annex A of Part 1 (Section A.1.2.1)
says:

   A data element then is a single unit of data that in a certain
   context is considered indivisible.  It is a unit of data
   representing a single fact about a type of object (object
   class) in the natural world.

Here the operative word for me is "context."  Consider a purchase
order.  From the RDBMS viewpoint that dominates most of 11179,
this is indeed a collection of individual data elements, each of
which an organization might well want to register individually.
But in some processing contexts, the purchase order itself is the
indivisible unit, and we might with equal justification wish to
register that.  The DTD for a purchase order certainly represents
"a single fact about a type of object ... in the natural world,"
namely the grammar of a purchase order.  (An objection that a DTD
is not a single fact would, in my opinion, reflect an unexamined
notion of what a fact is.  Surely the characters composing the
string that expresses a name in a database are themselves "facts"
about that name, and the individual digits of the Unicode
representation of each character are "facts" about that
character, and so on.)

I think that the question of what constitutes a data element boils
down to how much structure it can have.  The impression one gets
from most of 11179 is that data elements are atomic (or to be more
accurate, that data elements are the structures that live just
above the level of individual characters).  But 11179 is
quite clear that data elements can contain substructures.  From
Informative Annex A of Part 1 again:

   Sometimes data elements are derived from several constituent
   parts, where each of the parts are represented as data
   elements.  These derivations can be of many forms.  An example
   is concatenation for the formation of a telephone number from
   its constituent parts.  In the U.S., telephone numbers are
   uniquely described with ten digits, and these numbers can
   easily be represented by a data element.  However, the
   telephone companies (and others) need the telephone number
   separated into area code, exchange code, and line number,
   making three data elements.  Concatenating the area code,
   exchange code, and line number (in the right way) allows the
   formation of a data element representing the full telephone
   number.

The section following this passage begins "Data elements in
relational databases appear as field labels in tables," and from
that point on one encounters nothing but examples that reinforce a
view of data elements as atomic data, but the fact remains that
the standard (or at least the informative part) recognizes that
this is not necessarily true of all data elements.  This is
brought out even more clearly in Informative Annex A of Part 3
(Section A.3.1):

   In a data management environment it may be required to control
   the relation between 'composite data elements' and the
   'component data elements' which form the 'composite data
   element'.

   Example: 

   The composite data element: 'address' may be composed of the
   component data elements: 'name of addressee', 'street name',
   'street number', 'city name', 'postal code', 'country name'.

So while I agree with Una that 11179 is not particularly friendly
to tree-structured information elements, it doesn't rule them out,
and I see nothing in 11179 that would prevent its use in
registering DTDs and other complex data structures.

While this is the kind of simplification that is almost guaranteed
to get me in trouble, I will go even further and say that as far
as I can tell, "11179 data element (type)" and "XML element
(type)" can be considered, if not identical, at least isomorphic
concepts in the sense that anything that can be an XML element
type can be registered as a 11179 data element type.  (The same is
not true, I hasten to add, of "11179 attribute" and "XML
attribute.")

What seems to me to be missing from 11179 is the concept of a
hierarchy of data elements, and if this is what's bothering Una,
then I admit to sharing the same concern.  It appears that if I
want to register a purchase order DTD and also register the
transaction number in that DTD, I will have to register two
separate items that are, as far as the registry is concerned, at
the same level logically and functionally.  While workable, this
doesn't map well to an XML way of thinking about data structures.
However, it's hard to see just now what would.

Perhaps the best way to look at this is that 11179 provides a way
to register both schemas at the top level and individual "fields"
at the bottom level and that we must look to schemas themselves to
provide a way to describe the relation between top and bottom.  I
believe that this is what Terry meant when he wrote

| Eventually we may want to be able to stitch together 11179 and
| some revision of XML Schema so as to give continuity from top to
| bottom

In the meantime, I don't see any basic problem with telling
submitters of the repository to register whole DTDs (or, as in the
case of DocBook, DTD modules) and to put definitions of individual
elements into the associated documentation.  The fact that
submitters have the option of also registering individual
low-level XML elements (i.e., element types) or even individual
XML attributes (i.e., attribute names) doesn't strike me as a big
problem; on the contrary, we can view the job of developing
standard DTDs over the next few years as one of abstracting out
the common substructures of various standard DTDs and registering
them separately so that they can constitute the common components
from which further DTDs will be constructed.

Jon

===================================================
Jon Bosak, Distinguished Engineer, Sun Microsystems
Chair, OASIS Process Advisory Committee
===================================================
   A design problem is not an optimization problem.
                           -- Christopher Alexander
===================================================



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC