OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

regrep message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: re the simple case


Jon wrote:
| Terry has already responded to points raised by Una Kearns in her
| message of 2000.03.20.  Here I will assay my own response, partly
| in an attempt to engage in a meaningful conversation on this
| subject but mostly, I will confess, to see whether I have
| understood anything at all from my first reading of 11179.
| 
| I haven't yet gone over the regrep spec itself yet, so nothing in
| what I have to say at this point relates in the slightest to the
| regrep DTD and its connection to 11179, only to 11179 itself.
| 
| The part of Una's message that interests me most is this:
| 
| | I have spent quite a few hours reading and re-reading ISO 11179
| | and also the regrep spec.
| | 
| | Firstly -- I feel we are trying to pigeon-hole what we are trying
| | to accomplish against a specification that had a different purpose
| | in mind.  From reading the EBXML mailing list -- I feel that
| | there are people there of the same opinion.
| | 
| | There are definitely common good concepts in ISO 11179 that apply
| | to registration and metadata that apply just generally to basic
| | modelling of information, but there is no-way we can say we are
| | 100% conformant to ISO 11179 and I don't think we would want to be
| | or nobody would use the XML.org Registry and Repository.
| | 
| | The basic reason why is:
| | 
| | ISO 11179 is about registering and storing data-elements ( and
| | thats it).  A data element is as they state themselves an
| | undivisable unit of data examples they give themselves would be:
| | country of origin code, employee number, an employee lastname,
| | product description, etc...
| 
| My initial impression of 11179 was largely the same as Una's --
| that this specification was designed for the registration of
| labeled atomic data items like part numbers and units.  On what I
| think is a closer reading I now believe that this impression is
| only partly true.  What I now believe is that 11179 comprehends a
| much deeper notion of what can constitute a data element but that
| its authors were concerned mainly with the application of the
| standard to relational databases and that this bias so strongly
| permeates the examples that one gains a misleading impression of
| what a data element can be.

Yes.  In fact if you extend the present OASIS spec and allow
composite data elements (which Jon gets to below) you have something
like an XML Schema without some of the bells and whistles.

| Una says
| 
| | A data element is as they state themselves an undivisable unit
| | of data
| 
| but that's not quite right.  What 11179-1 says is that a data
| element is
| 
|    A unit of data for which the definition, identification,
|    representation, and permissible values are specified by means
|    of a set of attributes.
| 
| As far as I can tell, this says that a data element is any piece
| of data about which metadata can be asserted, or more simply yet,
| any piece of data to which a unique identifier can be assigned;
| that would be just about anything.

Not really; the point is that a data element must have a 
definition (data element concept), an identifier, and a specific
representation (not all date elements have permissible values).

Part 3 says,
3.4  data element concept : A concept which can be represented in the form
of a data element, described independently of any particular representation.

"Length" is a data element concept; it can be represented as
inches, centimeters, and so on.  Only when a data element
concept is conjoined with a representation does it become
a data element (which may have different names in different
contexts).  From my notes on the 1999 Open Forum:

"ISO 11179 Past, Present, Future - a Thumbnail Sketch," by
Bruce Bargmeyer, EPA.  Distinguishes real world (an apple),
data (a sticker on the apple with a number on it) and
metadata (information about that number).  A data element
*concept* such as "U.S. State Identifier" could be a set of full names
(Alaska, California), abbreviations (AK, CA), or numeric codes
(01, 12).

That's a data element that has permissible values.

| More helpfully (and I feel sure this is the passage that Una is
| referring to), Informative Annex A of Part 1 (Section A.1.2.1)
| says:
| 
|    A data element then is a single unit of data that in a certain
|    context is considered indivisible.  It is a unit of data
|    representing a single fact about a type of object (object
|    class) in the natural world.

The passage in full:

Bytes and bits are also components of data.  Although they may be
used to record data elements in an electronic medium, they do not 
correspond to data elements.  In a database, a data element may be 
implemented as a field or column.  In ChenĘs ER data model, it is an 
attribute (see Figure B-5).  A data element then is a single unit of 
data that in a certain context is considered indivisible.  It is a 
unit of data representing a single fact about a type of object 
(object class) in the natural world. (For example, a one character 
code with allowed values of "M" or "S" representing the marital status 
attribute of an "employee" object class.)  It cannot be decomposed 
into more fundamental segments of information that have useful meanings 
within the scope of its application.  Data elements are thus defined 
as relevant to the user within the user's universe of discourse.  
Data elements are electronic or written representations of the 
properties of natural-world object classes.

| Here the operative word for me is "context."  Consider a purchase
| order.  From the RDBMS viewpoint that dominates most of 11179,
| this is indeed a collection of individual data elements, each of
| which an organization might well want to register individually.
| But in some processing contexts, the purchase order itself is the
| indivisible unit, and we might with equal justification wish to
| register that.  The DTD for a purchase order certainly represents
| "a single fact about a type of object ... in the natural world,"
| namely the grammar of a purchase order.  (An objection that a DTD
| is not a single fact would, in my opinion, reflect an unexamined
| notion of what a fact is.  Surely the characters composing the
| string that expresses a name in a database are themselves "facts"
| about that name, and the individual digits of the Unicode
| representation of each character are "facts" about that
| character, and so on.)

And we could represent a DTD as a composite data element or as
a data element dictionary that contains composite data elements.
For mere registration of DTDs in whole, we don't want to unpack 
what's inside the DTD, but as we might later, I've assimilated
DTD to data element dictionary.

| I think that the question of what constitutes a data element boils
| down to how much structure it can have.  The impression one gets
| from most of 11179 is that data elements are atomic (or to be more
| accurate, that data elements are the structures that live just
| above the level of individual characters).  But 11179 is
| quite clear that data elements can contain substructures.  From
| Informative Annex A of Part 1 again:
| 
|    Sometimes data elements are derived from several constituent
|    parts, where each of the parts are represented as data
|    elements.  These derivations can be of many forms.  An example
|    is concatenation for the formation of a telephone number from
|    its constituent parts.  In the U.S., telephone numbers are
|    uniquely described with ten digits, and these numbers can
|    easily be represented by a data element.  However, the
|    telephone companies (and others) need the telephone number
|    separated into area code, exchange code, and line number,
|    making three data elements.  Concatenating the area code,
|    exchange code, and line number (in the right way) allows the
|    formation of a data element representing the full telephone
|    number.
| 
| The section following this passage begins "Data elements in
| relational databases appear as field labels in tables," and from
| that point on one encounters nothing but examples that reinforce a
| view of data elements as atomic data, but the fact remains that
| the standard (or at least the informative part) recognizes that
| this is not necessarily true of all data elements.  This is
| brought out even more clearly in Informative Annex A of Part 3
| (Section A.3.1):
| 
|    In a data management environment it may be required to control
|    the relation between 'composite data elements' and the
|    'component data elements' which form the 'composite data
|    element'.
| 
|    Example: 
| 
|    The composite data element: 'address' may be composed of the
|    component data elements: 'name of addressee', 'street name',
|    'street number', 'city name', 'postal code', 'country name'.
| 
| So while I agree with Una that 11179 is not particularly friendly
| to tree-structured information elements, it doesn't rule them out,
| and I see nothing in 11179 that would prevent its use in
| registering DTDs and other complex data structures.
| 
| While this is the kind of simplification that is almost guaranteed
| to get me in trouble, I will go even further and say that as far
| as I can tell, "11179 data element (type)" and "XML element
| (type)" can be considered, if not identical, at least isomorphic
| concepts in the sense that anything that can be an XML element
| type can be registered as a 11179 data element type.  (The same is
| not true, I hasten to add, of "11179 attribute" and "XML
| attribute.")

Yes.  And XML attributes can be considered as 11179
data elements (and when they have specified values, they
can be considered as data elements with sets of permissible values).

| What seems to me to be missing from 11179 is the concept of a
| hierarchy of data elements, and if this is what's bothering Una,
| then I admit to sharing the same concern.  It appears that if I
| want to register a purchase order DTD and also register the
| transaction number in that DTD, I will have to register two
| separate items that are, as far as the registry is concerned, at
| the same level logically and functionally.  While workable, this
| doesn't map well to an XML way of thinking about data structures.
| However, it's hard to see just now what would.

If you register the DTD as a data element dictionary, then you
can make the data element dictionary contain the data element
for transaction number.  It is certainly true that 11179 doesn't
go far in contemplating a registry that is more than one big
data element dictionary, but that doesn't seem a serious problem.

| Perhaps the best way to look at this is that 11179 provides a way
| to register both schemas at the top level and individual "fields"
| at the bottom level and that we must look to schemas themselves to
| provide a way to describe the relation between top and bottom.  I
| believe that this is what Terry meant when he wrote
| 
| | Eventually we may want to be able to stitch together 11179 and
| | some revision of XML Schema so as to give continuity from top to
| | bottom

Yes.  There isn't any immediate point in describing transaction
number as a 11179 data element if we already have it described
in an XML Schema/DTD representation (which is what we want to use
directly anyway) inside the schema/DTD.  So it would be nice
to be able to shift gears.

However, there is a nonimmediate point in so describing transaction
number:  doing so enables harmonization of data elements that
share the same data element concept but have different
representations (and different names) in different contexts.   
That's what the people who have data sets on their hands are
excited about, and it's something EBXML probably needs.

| In the meantime, I don't see any basic problem with telling
| submitters of the repository to register whole DTDs (or, as in the
| case of DocBook, DTD modules) and to put definitions of individual
| elements into the associated documentation.  The fact that
| submitters have the option of also registering individual
| low-level XML elements (i.e., element types) or even individual
| XML attributes (i.e., attribute names) doesn't strike me as a big
| problem; on the contrary, we can view the job of developing
| standard DTDs over the next few years as one of abstracting out
| the common substructures of various standard DTDs and registering
| them separately so that they can constitute the common components
| from which further DTDs will be constructed.

I think so too.  Thanks, Jon.

regards, Terry



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC