[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: [ubl-ndrsc] Containership Proposal
Folks: As per the discussion last Wednesday, here is a brief write-up of my arguments regarding containership. Cheers, Arofan _____________ UBL Release Op70 and Containership ______________________________ Overview: In the discussions about containership, a decision had been made to wait until the Op70 release to see how the "normalization" of the LCSC modelling activities would translate into XML structures, before making any decision about containership. Generally speaking, the resulting XML has produced a satisfactory level of containership. There are two areas where there are problems, however: at the very top level, looking at the children of document elements (Order, etc.); and in those cases where a child element could be repeated many times, producing a "list" of like elements. These two cases are examined primarily in terms of their effect on XML processing, and whether they will prove to be sub-optimized from the point of view of XML processing with common tools/technologies. This argument also looks at the easy comprehension of the XMl structures in these cases, however, and whether the usability of the XML structures might be enhanced by the existence of additional containers in these two cases. The issue of whether these containers represent semantic constructs is left open for discussion, as it seems there may be some disagreement on this point. It is assumed that this discussion will take place as the arguments presented here are considered. Issues: As currently structured, the immedate child elements of a UBL document are of two types - the "header" elements, appearing first in the document, as a set of immediate children, and then a set of "item" elements, which in other vocabularies typically make up the "body" section of a document. This structuring is problematic for a number of reasons: (1) Usability: It is easier to see the distinction between these two types of child elements if they are organized into two groups - a "header" and a "body". Even if this is merely the result of traditional, presentation-based structuring of vocabularies, it is still the case that many developers (and other users) will find having the document-level element broken out into two sections - header and body - easier to work with. This is not our primary argument here, but, as we will see below, it becomes more important when we look at the use of extensions. (2) DOM Processing Efficiency: Because many common XML tools use DOM structures to represent XML in memory - notably XSLT and XSL-FO - we need to look at how well optimized the existing structures are for this type of processing. When a specific element is selected from a DOM representation, the nodes of the DOM tree must be examined to find the desired node or nodes, often without recourse to the XML schema itself. This means that the processor must examine each immediate child of the root node, select those that match the selection criteria, and then examine the immediate children of the matching nodes, and so on down the tree, until the matching nodes have been found. With the existing Op70, this is potentially a problem, particularly with large documents, or with some large stylesheets. If I want to select an item-type element from the body, I will have to examine a handful of "header" elements before finding the matches in the "body" section below. This is not ideal, but is not necessarily a problem, because there are not a large number of header-type elements. The reverse case, however, is more problematic. If I wish to select a header-type element from a document with 200 items, then I will need to examine not only each of the relatively few header elements, but also each of the 200 item elements. When the number of potential selects in an XSLT stylesheet is considered, for example, then we will see that we may have a problem. By comparison, the existence of containers for the header and body elements would allow the processor to examine many fewer children (two at the document level, and then at most the handful of header elements at that level). To briefly look at the way the numbers work: in the existing structures, in an instance with 7 header elements and 200 items, to select a header element I would need to examine the 207 immediate children of the document element (and then however many nodes existed as children of each matched node); with header and container elements, the first selection makes me examine 2 nodes, and then the 7 different nodes inside the header element (total nodes examined = 9). While this will clearly vary with the number of items in the document instance, do we really want to design document structures that perform well only with small instances? There is no performance down-side to adding a level of containership here, and only a very minor impact on the amount of memory required to store the DOM tree being processed. These same processing inefficienies will exist with any element structure any of whose immediate children have cardinalities such as 1..n or 0..n.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]