[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [ubl-ndrsc] Containership Proposal
I quite agree w/Arofan. On Fri, 2003-03-07 at 10:58, A Gregory wrote: > Folks: > > As per the discussion last Wednesday, here is a brief write-up of my > arguments regarding containership. > > Cheers, > > Arofan > > _____________ > > UBL Release Op70 and Containership > ______________________________ > > Overview: > > In the discussions about containership, a decision had been made to wait > until the Op70 release to see how the "normalization" of the LCSC modelling > activities would translate into XML structures, before making any decision > about containership. Generally speaking, the resulting XML has produced a > satisfactory level of containership. There are two areas where there are > problems, however: at the very top level, looking at the children of > document elements (Order, etc.); and in those cases where a child element > could be repeated many times, producing a "list" of like elements. > > These two cases are examined primarily in terms of their effect on XML > processing, and whether they will prove to be sub-optimized from the point > of view of XML processing with common tools/technologies. This argument also > looks at the easy comprehension of the XMl structures in these cases, > however, and whether the usability of the XML structures might be enhanced > by the existence of additional containers in these two cases. > > The issue of whether these containers represent semantic constructs is left > open for discussion, as it seems there may be some disagreement on this > point. It is assumed that this discussion will take place as the arguments > presented here are considered. > > Issues: > > As currently structured, the immedate child elements of a UBL document are > of two types - the "header" elements, appearing first in the document, as a > set of immediate children, and then a set of "item" elements, which in other > vocabularies typically make up the "body" section of a document. This > structuring is problematic for a number of reasons: > > (1) Usability: > > It is easier to see the distinction between these two types of child > elements if they are organized into two groups - a "header" and a "body". > Even if this is merely the result of traditional, presentation-based > structuring of vocabularies, it is still the case that many developers (and > other users) will find having the document-level element broken out into two > sections - header and body - easier to work with. This is not our primary > argument here, but, as we will see below, it becomes more important when we > look at the use of extensions. > > (2) DOM Processing Efficiency: > > Because many common XML tools use DOM structures to represent XML in > memory - notably XSLT and XSL-FO - we need to look at how well optimized the > existing structures are for this type of processing. When a specific element > is selected from a DOM representation, the nodes of the DOM tree must be > examined to find the desired node or nodes, often without recourse to the > XML schema itself. This means that the processor must examine each immediate > child of the root node, select those that match the selection criteria, and > then examine the immediate children of the matching nodes, and so on down > the tree, until the matching nodes have been found. > > With the existing Op70, this is potentially a problem, particularly with > large documents, or with some large stylesheets. If I want to select an > item-type element from the body, I will have to examine a handful of > "header" elements before finding the matches in the "body" section below. > This is not ideal, but is not necessarily a problem, because there are not a > large number of header-type elements. The reverse case, however, is more > problematic. If I wish to select a header-type element from a document with > 200 items, then I will need to examine not only each of the relatively few > header elements, but also each of the 200 item elements. When the number of > potential selects in an XSLT stylesheet is considered, for example, then we > will see that we may have a problem. > > By comparison, the existence of containers for the header and body elements > would allow the processor to examine many fewer children (two at the > document level, and then at most the handful of header elements at that > level). To briefly look at the way the numbers work: in the existing > structures, in an instance with 7 header elements and 200 items, to select a > header element I would need to examine the 207 immediate children of the > document element (and then however many nodes existed as children of each > matched node); with header and container elements, the first selection makes > me examine 2 nodes, and then the 7 different nodes inside the header element > (total nodes examined = 9). > > While this will clearly vary with the number of items in the document > instance, do we really want to design document structures that perform well > only with small instances? There is no performance down-side to adding a > level of containership here, and only a very minor impact on the amount of > memory required to store the DOM tree being processed. > > These same processing inefficienies will exist with any element structure > any of whose immediate children have cardinalities such as 1..n or 0..n. > From a processing perspective, "list containers would make the selection of > these children - and other, non-repeating children with the same parent - > much more efficient. When all of this type of element in a message is > considered, the processing efficiency could be compromised. > > (3) Encapsulation and Java Binding: > > Many tools for working with XML use a Java binding that equates elements > with java classes, which are then provided with "get" and "set" calls for > things like child elements or attribute values. This is true of such tools > as JAXB from Sun, and many other, similar technologies. (If you think about > it, this is very much the way we have done our data modelling, but in > reverse! Each class in the data model becomes an element/type in the XML > structures.) > > In object-oriented programming languages, encapsulation works to simplify > and make more readable the code that is created. In our case, if I want all > the "header" information in a business document, and I am using a Java > binding as characterized above, I will need to deal with a set of a > half-dozen or more objects in order to construct or read from the document > object, as opposed to having a single object that encapsulates these. When I > want to get all of the body-type elements - items - from my order, I would > like to have a single object (a "body" object) that represented an array of > like objects (items), as this simplifies the code that reads or creates > these items in the document. In processing terms, the header and body > information is quite different - often, the "header" information provides > the context in which the items in the "body" are processed, so a division at > this level makes a great deal of sense from the point of view of object > encapsulation. > > (4) Extension Methodology: > > The current Op70 release is fairly adequate from this perspective, with the > exception of the lack of division of the document into "header" and body" > elements. (This argument does not apply to "lists" elsewhere in the document > structure.) Because XSD extension only allows us to add elements at the > *end* of existing structures, any additions of header-type information I > make at the document level will have to appear after all of the items in the > document. This exacerbates all of the problems stated above: because I can't > add an element to the header information (there not being a containing > element for header information), I have some header information before the > items, and some afterwards. This is extremely confusing to users, and > suffers from all of the processing inefficiencies stated before. It is also > suceptible to the same solution - the addition of a containing element > around the header information, so that header extensions could be added > there. > > Note that this effect also complicates SAX-type processing. Often, > header-type information sets the stage for item processing, by establishing > who is placing an order, for example. When processing without benefit of a > DOM, added header information that appears _after_ the items of the document > would require a second pass through the XML instance, to determine what to > do with the items in the instance, assuming this header information is > needed to understand how to process the items. This negates much of the > efficiency advantages of SAX processing over DOM processing - the use of > memory to record the contents of the document while processing. > > > Recommendations: > > The recommendations here are simple, and easily expressed as rules that > could be automatically enforced with the scripts that generate the XML > structures: > > (1) All documents are divided into a "header" section and a "body" section > (which division is already implicit in the contents of the messages in Op70, > a suspicious fact when you consider the usability arguments above...), using > some simple naming rules based on the name of the document-level element. > These constructs, if deemed to have semantic content, could appear in the > business models; alternately, they could only appear in the XML and the > implementation models. (On this point, I am agnostic, but I would like to > point out that there is no requirement here to impact the work of LCSC at > all!) > > (2) All elements that have a cardinality of 1..n or 0..n should have a "list > container", the name of which is created by adding an "s" to the end of the > child element contained (or otherwise pluralizing it). Again, this need not > be a semantic construct appearing in the business model, but could simply be > a construct in the implementation model. There is no need for the work of > LCSC to be altered. > > > > ---------------------------------------------------------------- > To subscribe or unsubscribe from this elist use the subscription > manager: <http://lists.oasis-open.org/ob/adm.pl> -- Eduardo Gutentag | e-mail: eduardo.gutentag@Sun.COM Web Technologies and Standards | Phone: +1 510 550 4616 x31442 Sun Microsystems Inc. | 1800 Harrison St. Oakland, CA 94612 W3C AC Rep / OASIS TAB Chair ---------------------------------------------------------------- To subscribe or unsubscribe from this elist use the subscription manager: <http://lists.oasis-open.org/ob/adm.pl>
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]