ubl-ndrsc message

Subject: Re: [ubl-ndrsc] Containership Proposal
From: Eduardo Gutentag <eduardo.gutentag@sun.com>
To: ubl-ndrsc@lists.oasis-open.org
Date: Fri, 07 Mar 2003 11:23:11 -0800
I quite agree w/Arofan. 

On Fri, 2003-03-07 at 10:58, A Gregory wrote:
> Folks:
> 
> As per the discussion last Wednesday, here is a brief write-up of my
> arguments regarding containership.
> 
> Cheers,
> 
> Arofan
> 
> _____________
> 
> UBL Release Op70 and Containership
> ______________________________
> 
> Overview:
> 
> In the discussions about containership, a decision had been made to wait
> until the Op70 release to see how the "normalization" of the LCSC modelling
> activities would translate into XML structures, before making any decision
> about containership. Generally speaking, the resulting XML has produced a
> satisfactory level of containership. There are two areas where there are
> problems, however: at the very top level, looking at the children of
> document elements (Order, etc.); and in those cases where a child element
> could be repeated many times, producing a "list" of like elements.
> 
> These two cases are examined primarily in terms of their effect on XML
> processing, and whether they will  prove to be sub-optimized from the point
> of view of XML processing with common tools/technologies. This argument also
> looks at the easy comprehension of the XMl structures in these cases,
> however, and whether  the usability of the XML structures might be enhanced
> by the existence of additional containers in these two cases.
> 
> The issue of whether these containers represent semantic constructs is left
> open for discussion, as it seems there may be some disagreement on this
> point. It is assumed that this discussion will take place as the arguments
> presented here are considered.
> 
> Issues:
> 
> As currently structured, the immedate child elements of a UBL document are
> of two types - the "header" elements, appearing first in the document, as a
> set of immediate children, and then a set of "item" elements, which in other
> vocabularies typically make up the "body" section of a document. This
> structuring is problematic for a number of reasons:
> 
> (1) Usability:
> 
> It is easier to see the distinction between these two types of child
> elements if they are organized into two groups - a "header" and a "body".
> Even if this is merely the result of traditional, presentation-based
> structuring of vocabularies, it is still the case that many developers (and
> other users) will find having the document-level element broken out into two
> sections - header and body - easier to work with. This is not our primary
> argument here, but, as we will see below, it becomes more important when we
> look at the use of extensions.
> 
> (2) DOM Processing Efficiency:
> 
> Because many common XML tools use DOM structures to represent XML in
> memory - notably XSLT and XSL-FO - we need to look at how well optimized the
> existing structures are for this type of processing. When a specific element
> is selected from a DOM representation, the nodes of the DOM tree must be
> examined to find the desired node or nodes, often without recourse to the
> XML schema itself. This means that the processor must examine each immediate
> child of the root node, select those that match the selection criteria, and
> then examine the immediate children of the matching nodes, and so on down
> the tree, until the matching nodes have been found.
> 
> With the existing Op70, this is potentially a problem, particularly with
> large documents, or with some large stylesheets. If I want to select an
> item-type element from the body, I will have to examine a handful of
> "header" elements before finding the matches in the "body" section below.
> This is not ideal, but is not necessarily a problem, because there are not a
> large number of header-type elements. The reverse case, however, is more
> problematic. If I wish to select a header-type element from a document with
> 200 items, then I will need to examine not only each of the relatively few
> header elements, but also each of the 200 item elements. When the number of
> potential selects in an XSLT stylesheet is considered, for example, then we
> will see that we may have a problem.
> 
> By comparison, the existence of containers for the header and body elements
> would allow the processor to examine many fewer children (two at the
> document level, and then at most the handful of header elements at that
> level). To briefly look at the way the numbers work: in the existing
> structures, in an instance with 7 header elements and 200 items, to select a
> header element I would need to examine the 207 immediate children of the
> document element (and then however many nodes existed as children of each
> matched node); with header and container elements, the first selection makes
> me examine 2 nodes, and then the 7 different nodes inside the header element
> (total nodes examined = 9).
> 
> While this will clearly vary with the number of items in the document
> instance, do we really want to design document structures that perform well
> only with small instances? There is no performance down-side to adding a
> level of containership here, and only a very minor impact on the amount of
> memory required to store the DOM tree being processed.
> 
> These same processing inefficienies will exist with any element structure
> any of whose immediate children have cardinalities such as 1..n or 0..n.
> From a processing perspective, "list containers would make the selection of
> these children - and other, non-repeating children with the same parent -
> much more efficient. When all of this type of element in a message is
> considered, the processing efficiency could be compromised.
> 
> (3) Encapsulation and Java Binding:
> 
> Many tools for working with XML use a Java binding that equates elements
> with java classes, which are then provided with "get" and "set" calls for
> things like child elements or attribute values. This is true of such tools
> as JAXB from Sun, and many other, similar technologies. (If you think about
> it, this is very much the way we have done our data modelling, but in
> reverse! Each class in the data model becomes an element/type in the XML
> structures.)
> 
> In object-oriented programming languages, encapsulation works to simplify
> and make more readable the code that is created. In our case, if I want all
> the "header" information in a business document, and I am using a Java
> binding as characterized above, I will need to deal with a set of a
> half-dozen or more objects in order to construct or read from the document
> object, as opposed to having a single object that encapsulates these. When I
> want to get all of the body-type elements - items - from my order, I would
> like to have a single object (a "body" object) that represented an array of
> like objects (items), as this simplifies the code that reads or creates
> these items in the document. In processing terms, the header and body
> information is quite different - often, the "header" information provides
> the context in which the items in the "body" are processed, so a division at
> this level makes a great deal of sense from the point of view of object
> encapsulation.
> 
> (4) Extension Methodology:
> 
> The current Op70 release is fairly adequate from this perspective, with the
> exception of the lack of division of the document into "header" and body"
> elements. (This argument does not apply to "lists" elsewhere in the document
> structure.) Because XSD extension only allows us to add elements at the
> *end* of existing structures, any additions of header-type information I
> make at the document level will have to appear after all of the items in the
> document. This exacerbates all of the problems stated above: because I can't
> add an element to the header information (there not being a containing
> element for header information), I have some header information before the
> items, and some afterwards. This is extremely confusing to users, and
> suffers from all of the processing inefficiencies stated before. It is also
> suceptible to the same solution - the addition of a containing element
> around the header information, so that header extensions could be added
> there.
> 
> Note that this effect also complicates SAX-type processing. Often,
> header-type information sets the stage for item processing, by establishing
> who is placing an order, for example. When processing without benefit of a
> DOM, added header information that appears _after_ the items of the document
> would require a second pass through the XML instance, to determine what to
> do with the items in the instance, assuming this header information is
> needed to understand how to process the items. This negates much of the
> efficiency advantages of SAX processing over DOM processing - the use of
> memory to record the contents of the document while processing.
> 
> 
> Recommendations:
> 
> The recommendations here are simple, and easily expressed as rules that
> could be automatically enforced with the scripts that generate the XML
> structures:
> 
> (1) All documents are divided into a "header" section and a "body" section
> (which division is already implicit in the contents of the messages in Op70,
> a suspicious fact when you consider the usability arguments above...), using
> some simple naming rules based on the name of the document-level element.
> These constructs, if deemed to have semantic content, could appear in the
> business models; alternately, they could only appear in the XML and the
> implementation models. (On this point, I am agnostic, but I would like to
> point out that there is no requirement here to impact the work of LCSC at
> all!)
> 
> (2) All elements that have a cardinality of 1..n or 0..n should have a "list
> container", the name of which is created by adding an "s" to the end of the
> child element contained (or otherwise pluralizing it). Again, this need not
> be a semantic construct appearing in the business model, but could simply be
> a construct in the implementation model. There is no need for the work of
> LCSC to be altered.
> 
> 
> 
> ----------------------------------------------------------------
> To subscribe or unsubscribe from this elist use the subscription
> manager: <http://lists.oasis-open.org/ob/adm.pl>
-- 
Eduardo Gutentag               |         e-mail: eduardo.gutentag@Sun.COM
Web Technologies and Standards |         Phone:  +1 510 550 4616 x31442
Sun Microsystems Inc.          |         1800 Harrison St. Oakland, CA 94612
W3C AC Rep / OASIS TAB Chair


----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>
References:
- [ubl-ndrsc] Containership Proposal
  - From: A Gregory <agregory@aeon-llc.com>