ubl-ndrsc message

Subject: Re: [ubl-ndrsc] Containership Proposal
From: Eduardo Gutentag <eduardo.gutentag@sun.com>
To: Tim McGrath <tmcgrath@portcomm.com.au>
Date: Sun, 09 Mar 2003 16:41:58 -0800
> [...] having said that, it may be useful to canvas Ken 
> Holman's view as he is our first UBL 0p70 application builder.

Fair enough, as long as we remember that he's not the only one
in this list who has some knowledge in this area ;-)


On Sun, 2003-03-09 at 16:19, Tim McGrath wrote:
> i would like to thank arofan for this well drafted proposal, i am now 
> much clearer on the thinking behind this issue.  we are getting good at 
> these things!
> 
> firstly, i would  like to suggest that the issue of creating additional 
> wrappers/containers around  groups of ABIEs is entirely a technical/XML 
> processing one.  we have covered this ground before - these containers 
> have no semantic value and have no collective properties.  If they did, 
> they would be ABIEs.
> 
> secondly, i recognise the processing concerns raised and  agree that we 
> should design schemas to be efficient.
> 
> thirdly, there is an assumption that documents all have a "header-body" 
> structure.  we should be aware we are on 70/30 ground here.  not all 
> documents have this pattern.  in my trade and transport experience, i 
> have seen "header" and "header-body-body" frequently.  it is just that 
> procurement has this pattern and we started with that context.  so i 
> dont think recommendation 1. works for all cases.  if it doesn't, it 
> raises the question of how and when we decide if the document suits this 
> pattern.  too hard!
> 
> however, if we adopt recommendation 2. it doesn't matter - the 'body' 
> (or 'bodies'), if they existed, would be list containers.  This is, of 
> course, not a semantic change, just schema implementation. if the NDR 
> decision was to await LC's 0p70 and having done so the processing issue 
> still exists then personally, I have no problem with this 
> recommendation.  having said that, it may be useful to canvas Ken 
> Holman's view as he is our first UBL 0p70 application builder.
> 
> 
> A Gregory wrote:
> 
> >Folks:
> >
> >As per the discussion last Wednesday, here is a brief write-up of my
> >arguments regarding containership.
> >
> >Cheers,
> >
> >Arofan
> >
> >_____________
> >
> >UBL Release Op70 and Containership
> >______________________________
> >
> >Overview:
> >
> >In the discussions about containership, a decision had been made to wait
> >until the Op70 release to see how the "normalization" of the LCSC modelling
> >activities would translate into XML structures, before making any decision
> >about containership. Generally speaking, the resulting XML has produced a
> >satisfactory level of containership. There are two areas where there are
> >problems, however: at the very top level, looking at the children of
> >document elements (Order, etc.); and in those cases where a child element
> >could be repeated many times, producing a "list" of like elements.
> >
> >These two cases are examined primarily in terms of their effect on XML
> >processing, and whether they will  prove to be sub-optimized from the point
> >of view of XML processing with common tools/technologies. This argument also
> >looks at the easy comprehension of the XMl structures in these cases,
> >however, and whether  the usability of the XML structures might be enhanced
> >by the existence of additional containers in these two cases.
> >
> >The issue of whether these containers represent semantic constructs is left
> >open for discussion, as it seems there may be some disagreement on this
> >point. It is assumed that this discussion will take place as the arguments
> >presented here are considered.
> >
> >Issues:
> >
> >As currently structured, the immedate child elements of a UBL document are
> >of two types - the "header" elements, appearing first in the document, as a
> >set of immediate children, and then a set of "item" elements, which in other
> >vocabularies typically make up the "body" section of a document. This
> >structuring is problematic for a number of reasons:
> >
> >(1) Usability:
> >
> >It is easier to see the distinction between these two types of child
> >elements if they are organized into two groups - a "header" and a "body".
> >Even if this is merely the result of traditional, presentation-based
> >structuring of vocabularies, it is still the case that many developers (and
> >other users) will find having the document-level element broken out into two
> >sections - header and body - easier to work with. This is not our primary
> >argument here, but, as we will see below, it becomes more important when we
> >look at the use of extensions.
> >
> >(2) DOM Processing Efficiency:
> >
> >Because many common XML tools use DOM structures to represent XML in
> >memory - notably XSLT and XSL-FO - we need to look at how well optimized the
> >existing structures are for this type of processing. When a specific element
> >is selected from a DOM representation, the nodes of the DOM tree must be
> >examined to find the desired node or nodes, often without recourse to the
> >XML schema itself. This means that the processor must examine each immediate
> >child of the root node, select those that match the selection criteria, and
> >then examine the immediate children of the matching nodes, and so on down
> >the tree, until the matching nodes have been found.
> >
> >With the existing Op70, this is potentially a problem, particularly with
> >large documents, or with some large stylesheets. If I want to select an
> >item-type element from the body, I will have to examine a handful of
> >"header" elements before finding the matches in the "body" section below.
> >This is not ideal, but is not necessarily a problem, because there are not a
> >large number of header-type elements. The reverse case, however, is more
> >problematic. If I wish to select a header-type element from a document with
> >200 items, then I will need to examine not only each of the relatively few
> >header elements, but also each of the 200 item elements. When the number of
> >potential selects in an XSLT stylesheet is considered, for example, then we
> >will see that we may have a problem.
> >
> >By comparison, the existence of containers for the header and body elements
> >would allow the processor to examine many fewer children (two at the
> >document level, and then at most the handful of header elements at that
> >level). To briefly look at the way the numbers work: in the existing
> >structures, in an instance with 7 header elements and 200 items, to select a
> >header element I would need to examine the 207 immediate children of the
> >document element (and then however many nodes existed as children of each
> >matched node); with header and container elements, the first selection makes
> >me examine 2 nodes, and then the 7 different nodes inside the header element
> >(total nodes examined = 9).
> >
> >While this will clearly vary with the number of items in the document
> >instance, do we really want to design document structures that perform well
> >only with small instances? There is no performance down-side to adding a
> >level of containership here, and only a very minor impact on the amount of
> >memory required to store the DOM tree being processed.
> >
> >These same processing inefficienies will exist with any element structure
> >any of whose immediate children have cardinalities such as 1..n or 0..n.
> >>From a processing perspective, "list containers would make the selection of
> >these children - and other, non-repeating children with the same parent -
> >much more efficient. When all of this type of element in a message is
> >considered, the processing efficiency could be compromised.
> >
> >(3) Encapsulation and Java Binding:
> >
> >Many tools for working with XML use a Java binding that equates elements
> >with java classes, which are then provided with "get" and "set" calls for
> >things like child elements or attribute values. This is true of such tools
> >as JAXB from Sun, and many other, similar technologies. (If you think about
> >it, this is very much the way we have done our data modelling, but in
> >reverse! Each class in the data model becomes an element/type in the XML
> >structures.)
> >
> >In object-oriented programming languages, encapsulation works to simplify
> >and make more readable the code that is created. In our case, if I want all
> >the "header" information in a business document, and I am using a Java
> >binding as characterized above, I will need to deal with a set of a
> >half-dozen or more objects in order to construct or read from the document
> >object, as opposed to having a single object that encapsulates these. When I
> >want to get all of the body-type elements - items - from my order, I would
> >like to have a single object (a "body" object) that represented an array of
> >like objects (items), as this simplifies the code that reads or creates
> >these items in the document. In processing terms, the header and body
> >information is quite different - often, the "header" information provides
> >the context in which the items in the "body" are processed, so a division at
> >this level makes a great deal of sense from the point of view of object
> >encapsulation.
> >
> >(4) Extension Methodology:
> >
> >The current Op70 release is fairly adequate from this perspective, with the
> >exception of the lack of division of the document into "header" and body"
> >elements. (This argument does not apply to "lists" elsewhere in the document
> >structure.) Because XSD extension only allows us to add elements at the
> >*end* of existing structures, any additions of header-type information I
> >make at the document level will have to appear after all of the items in the
> >document. This exacerbates all of the problems stated above: because I can't
> >add an element to the header information (there not being a containing
> >element for header information), I have some header information before the
> >items, and some afterwards. This is extremely confusing to users, and
> >suffers from all of the processing inefficiencies stated before. It is also
> >suceptible to the same solution - the addition of a containing element
> >around the header information, so that header extensions could be added
> >there.
> >
> >Note that this effect also complicates SAX-type processing. Often,
> >header-type information sets the stage for item processing, by establishing
> >who is placing an order, for example. When processing without benefit of a
> >DOM, added header information that appears _after_ the items of the document
> >would require a second pass through the XML instance, to determine what to
> >do with the items in the instance, assuming this header information is
> >needed to understand how to process the items. This negates much of the
> >efficiency advantages of SAX processing over DOM processing - the use of
> >memory to record the contents of the document while processing.
> >
> >
> >Recommendations:
> >
> >The recommendations here are simple, and easily expressed as rules that
> >could be automatically enforced with the scripts that generate the XML
> >structures:
> >
> >(1) All documents are divided into a "header" section and a "body" section
> >(which division is already implicit in the contents of the messages in Op70,
> >a suspicious fact when you consider the usability arguments above...), using
> >some simple naming rules based on the name of the document-level element.
> >These constructs, if deemed to have semantic content, could appear in the
> >business models; alternately, they could only appear in the XML and the
> >implementation models. (On this point, I am agnostic, but I would like to
> >point out that there is no requirement here to impact the work of LCSC at
> >all!)
> >
> >(2) All elements that have a cardinality of 1..n or 0..n should have a "list
> >container", the name of which is created by adding an "s" to the end of the
> >child element contained (or otherwise pluralizing it). Again, this need not
> >be a semantic construct appearing in the business model, but could simply be
> >a construct in the implementation model. There is no need for the work of
> >LCSC to be altered.
> >
> >
> >
> >----------------------------------------------------------------
> >To subscribe or unsubscribe from this elist use the subscription
> >manager: <http://lists.oasis-open.org/ob/adm.pl>
> >
> >  
> >
-- 
Eduardo Gutentag               |         e-mail: eduardo.gutentag@Sun.COM
Web Technologies and Standards |         Phone:  +1 510 550 4616 x31442
Sun Microsystems Inc.          |         1800 Harrison St. Oakland, CA 94612
W3C AC Rep / OASIS TAB Chair


----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>
References:
- [ubl-ndrsc] Containership Proposal
  - From: A Gregory <agregory@aeon-llc.com>
- Re: [ubl-ndrsc] Containership Proposal
  - From: Tim McGrath <tmcgrath@portcomm.com.au>