[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: The Simple Case
Terry has already responded to points raised by Una Kearns in her message of 2000.03.20. Here I will assay my own response, partly in an attempt to engage in a meaningful conversation on this subject but mostly, I will confess, to see whether I have understood anything at all from my first reading of 11179. I haven't yet gone over the regrep spec itself yet, so nothing in what I have to say at this point relates in the slightest to the regrep DTD and its connection to 11179, only to 11179 itself. The part of Una's message that interests me most is this: | I have spent quite a few hours reading and re-reading ISO 11179 | and also the regrep spec. | | Firstly -- I feel we are trying to pigeon-hole what we are trying | to accomplish against a specification that had a different purpose | in mind. From reading the EBXML mailing list -- I feel that | there are people there of the same opinion. | | There are definitely common good concepts in ISO 11179 that apply | to registration and metadata that apply just generally to basic | modelling of information, but there is no-way we can say we are | 100% conformant to ISO 11179 and I don't think we would want to be | or nobody would use the XML.org Registry and Repository. | | The basic reason why is: | | ISO 11179 is about registering and storing data-elements ( and | thats it). A data element is as they state themselves an | undivisable unit of data examples they give themselves would be: | country of origin code, employee number, an employee lastname, | product description, etc... My initial impression of 11179 was largely the same as Una's -- that this specification was designed for the registration of labeled atomic data items like part numbers and units. On what I think is a closer reading I now believe that this impression is only partly true. What I now believe is that 11179 comprehends a much deeper notion of what can constitute a data element but that its authors were concerned mainly with the application of the standard to relational databases and that this bias so strongly permeates the examples that one gains a misleading impression of what a data element can be. Una says | A data element is as they state themselves an undivisable unit | of data but that's not quite right. What 11179-1 says is that a data element is A unit of data for which the definition, identification, representation, and permissible values are specified by means of a set of attributes. As far as I can tell, this says that a data element is any piece of data about which metadata can be asserted, or more simply yet, any piece of data to which a unique identifier can be assigned; that would be just about anything. More helpfully (and I feel sure this is the passage that Una is referring to), Informative Annex A of Part 1 (Section A.1.2.1) says: A data element then is a single unit of data that in a certain context is considered indivisible. It is a unit of data representing a single fact about a type of object (object class) in the natural world. Here the operative word for me is "context." Consider a purchase order. From the RDBMS viewpoint that dominates most of 11179, this is indeed a collection of individual data elements, each of which an organization might well want to register individually. But in some processing contexts, the purchase order itself is the indivisible unit, and we might with equal justification wish to register that. The DTD for a purchase order certainly represents "a single fact about a type of object ... in the natural world," namely the grammar of a purchase order. (An objection that a DTD is not a single fact would, in my opinion, reflect an unexamined notion of what a fact is. Surely the characters composing the string that expresses a name in a database are themselves "facts" about that name, and the individual digits of the Unicode representation of each character are "facts" about that character, and so on.) I think that the question of what constitutes a data element boils down to how much structure it can have. The impression one gets from most of 11179 is that data elements are atomic (or to be more accurate, that data elements are the structures that live just above the level of individual characters). But 11179 is quite clear that data elements can contain substructures. From Informative Annex A of Part 1 again: Sometimes data elements are derived from several constituent parts, where each of the parts are represented as data elements. These derivations can be of many forms. An example is concatenation for the formation of a telephone number from its constituent parts. In the U.S., telephone numbers are uniquely described with ten digits, and these numbers can easily be represented by a data element. However, the telephone companies (and others) need the telephone number separated into area code, exchange code, and line number, making three data elements. Concatenating the area code, exchange code, and line number (in the right way) allows the formation of a data element representing the full telephone number. The section following this passage begins "Data elements in relational databases appear as field labels in tables," and from that point on one encounters nothing but examples that reinforce a view of data elements as atomic data, but the fact remains that the standard (or at least the informative part) recognizes that this is not necessarily true of all data elements. This is brought out even more clearly in Informative Annex A of Part 3 (Section A.3.1): In a data management environment it may be required to control the relation between 'composite data elements' and the 'component data elements' which form the 'composite data element'. Example: The composite data element: 'address' may be composed of the component data elements: 'name of addressee', 'street name', 'street number', 'city name', 'postal code', 'country name'. So while I agree with Una that 11179 is not particularly friendly to tree-structured information elements, it doesn't rule them out, and I see nothing in 11179 that would prevent its use in registering DTDs and other complex data structures. While this is the kind of simplification that is almost guaranteed to get me in trouble, I will go even further and say that as far as I can tell, "11179 data element (type)" and "XML element (type)" can be considered, if not identical, at least isomorphic concepts in the sense that anything that can be an XML element type can be registered as a 11179 data element type. (The same is not true, I hasten to add, of "11179 attribute" and "XML attribute.") What seems to me to be missing from 11179 is the concept of a hierarchy of data elements, and if this is what's bothering Una, then I admit to sharing the same concern. It appears that if I want to register a purchase order DTD and also register the transaction number in that DTD, I will have to register two separate items that are, as far as the registry is concerned, at the same level logically and functionally. While workable, this doesn't map well to an XML way of thinking about data structures. However, it's hard to see just now what would. Perhaps the best way to look at this is that 11179 provides a way to register both schemas at the top level and individual "fields" at the bottom level and that we must look to schemas themselves to provide a way to describe the relation between top and bottom. I believe that this is what Terry meant when he wrote | Eventually we may want to be able to stitch together 11179 and | some revision of XML Schema so as to give continuity from top to | bottom In the meantime, I don't see any basic problem with telling submitters of the repository to register whole DTDs (or, as in the case of DocBook, DTD modules) and to put definitions of individual elements into the associated documentation. The fact that submitters have the option of also registering individual low-level XML elements (i.e., element types) or even individual XML attributes (i.e., attribute names) doesn't strike me as a big problem; on the contrary, we can view the job of developing standard DTDs over the next few years as one of abstracting out the common substructures of various standard DTDs and registering them separately so that they can constitute the common components from which further DTDs will be constructed. Jon =================================================== Jon Bosak, Distinguished Engineer, Sun Microsystems Chair, OASIS Process Advisory Committee =================================================== A design problem is not an optimization problem. -- Christopher Alexander ===================================================
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC