ubl-ndrsc message

Subject: RE: [ubl-ndrsc] Rule: 115 and 116 Containers
From: "Jim Wilson" <jim.wilson@kcx.com>
To: "Burcham, Bill" <Bill_Burcham@stercomm.com>,"Chin Chee-Kai" <cheekai@softml.net>,"UBL-NDR" <ubl-ndrsc@lists.oasis-open.org>
Date: Thu, 17 Jul 2003 17:43:53 -0500
I don't have a vote but I'll throw in an opinion.

First of all, great discussion. I think Bill's analysis is right on.
That said, I still like container elements (key word "like"). I feel
that instance documents are slightly more intuitive and stylesheets are
more intuitive (key word "feel"). Given that "what Jim likes" is not
known to be a benefit to anyone but Jim, I certainly couldn't argue
against reversing the rule. I hope you don't though.

Regards,
Jim Wilson
CIDX
Chem eStandards & Guidelines Manager

-----Original Message-----
From: Burcham, Bill [mailto:Bill_Burcham@stercomm.com] 
Sent: Thursday, July 17, 2003 3:59 PM
To: 'Chin Chee-Kai'; UBL-NDR
Subject: RE: [ubl-ndrsc] Rule: 115 and 116 Containers


I'm with Chee-Kai -- I think [R 116] is wrong.  (I know it's probably
too
late -- but I'm gonna say my peace anyway :-)
The two cases I've heard made in favor of it are:

1. container elements foster more readable stylesheets
2. container elements significantly improve document processing
performance

Argument 1 is weak.  Forgive me for posting working code, but here is an
instance document with superfluous containers:

<?xml version="1.0" encoding="UTF-8"?>
<doc>
	<SuperfluousContainer>
		<Fruit>Apple</Fruit>
		<Fruit>Orange</Fruit>
		<Fruit>Banana</Fruit>
	</SuperfluousContainer>
</doc>

And here is a stylesheet to process it:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
	<xsl:output method="xml" version="1.0" encoding="UTF-8"
indent="yes"/>
	<xsl:template match="doc">
		<xsl:element name="NewDoc">
			<xsl:apply-templates select="current()/*"/>
		</xsl:element>
	</xsl:template>
	<xsl:template match="SuperfluousContainer">
		<BeforeFruit/>
		<xsl:apply-templates select="current()/*"/>
		<AfterFruit/>
	</xsl:template>
	<xsl:template match="Fruit">
		<AFruit>
			<xsl:value-of select="text()"/>
		</AFruit>
	</xsl:template>
</xsl:transform>

And here is the output:

<?xml version="1.0" encoding="UTF-8"?>
<NewDoc>
	<BeforeFruit/>
	<AFruit>Apple</AFruit>
	<AFruit>Orange</AFruit>
	<AFruit>Banana</AFruit>
	<AfterFruit/>
</NewDoc>

The example injects an element before the first fruit and after the last
one.  That's the example we've been discussing for a couple years as
being
the bugaboo here.

And here is an analogous source instance doc -- this time with no
superfluous containers:

<?xml version="1.0" encoding="UTF-8"?>
<doc>
	<Fruit>Apple</Fruit>
	<Fruit>Orange</Fruit>
	<Fruit>Banana</Fruit>
</doc>

And here is a different stylesheet to process this one:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
	<xsl:output method="xml" version="1.0" encoding="UTF-8"
indent="yes"/>
	<xsl:template match="doc">
		<xsl:element name="NewDoc">
			<xsl:apply-templates select="current()/*"/>
		</xsl:element>
	</xsl:template>
	<xsl:template match="Fruit">
		<xsl:if test="position() = 1">
		<BeforeFruit/>
		</xsl:if>
		<AFruit>
			<xsl:value-of select="text()"/>
		</AFruit>
		<xsl:if test="position() = last()">
		<AfterFruit/>		
		</xsl:if>
	</xsl:template>
</xsl:transform>

Comparing the two stylesheets I note that the one for superfluous
containers
is 19 lines and the one for repeating elements (with no superfluous
containers) is 20 lines.  That's only one line of code difference.  And
I
don't think the second stylesheet is any less readable than the first.

If I look at the two source documents, and extrapolate to larger
documents
with more nesting I can say with certainty that superfluous containers
make
for larger documents and IMHO are a bit harder for humans to read -- do
to
the increase in indentation necessitated by the deeper hierarchy.

As for point 2 (processing performance), that's just Voodoo Computer
Science.  So, which XML processing tools are we using for comparison?
Which
versions of those tools?  What is the use-case/scenario/algorithm?  How
big
is the document?  Worst-case, if you tell me that the document is HUGE
then
I'll tell you a) the Bolivian rug-weaver using Perl as the processing
tool
isn't gonna see the HUGE document and b) the company (Wal*Mart) that
sees
the HUGE document can darn-well write a transform on the incoming
document
(or four or five transforms) that make it more amenable to efficient
processing.

But you know what -- I still haven't seen any real _evidence_ that
superfluous containers provide any processing performance advantage in
the
first place.  It's more likely they hurt performance since they
_definitely_
make documents larger!

So by my count, it's:

Superfluous containers:  they make documents bigger (inflicting a
processing
burden) and harder for humans to read
Repeated elements (no superfluous containers): they make documents
smaller
and easier for humans to read, and necessitate a tiny bit more XSLT code
in
some situations.

Down with [R 116]!


Bill Burcham
Sr. Software Architect, Integration Software Development
Sterling Commerce, Inc.
469.524.2164
bill_burcham@stercomm.com

-----Original Message-----
From: Chin Chee-Kai [mailto:cheekai@softml.net] 
Sent: Wednesday, July 16, 2003 8:38 PM
To: UBL-NDR
Subject: Re: [ubl-ndrsc] Rule: 115 and 116 Containers


>>[R 115]  All documents shall have a container for metadata  and which 
>>proceeds the body of the document and is named  "Head" _____________. 
>>(anything but header)

>>[R 116]  All elements with a cardinality of 1..n, (and lack a 
>>qualifying
>>structure) must be contained by a list container named  "(name of
repeating
>>element)List", which has a cardinality of 1..1.

I remain critical of having to maintain such virtual structure for no
apparent use.  I've heard that the rules don't affect FPSC at all.  By
design, they should not affect LC.  So who's benefiting from carrying
all
the empty luggages around?


That said, I pointed out last time that the [R 115] should have
"precedes"
instead of "proceeds", unless the proponent of the rule wants Head
sitting
at the tail.



Best Regards,
Chin Chee-Kai
SoftML
Tel: +65-6820-2979
Fax: +65-6743-7875
Email: cheekai@SoftML.Net
http://SoftML.Net/


On Wed, 16 Jul 2003, Lisa-Aeon wrote:

>>Rules for Voting:  Each email will have only one rule in it, I will 
>>try to mark the rules that group with it, or rules that might 
>>duplicate it.  The membership has 5 working days to bring forth 
>>objection or discussion, after the 5 working days, if there are no 
>>objections, the rule will be assumed to be "ACCEPTED" and be given to 
>>the LCSC for their implementation.
>>
>>Please Reply leaving first email in Reply.
>>
>>Voting period on this rule ends:  July 23, 2003
>>
>>*******************************
>>I am combining the last two rules, because we have already voted on a 
>>decision.  These are the old rules:
>>
>>[R 115]  All documents shall have a container for metadata  and which 
>>proceeds the body of the document and is named  "Head" _____________. 
>>(anything but header)
>>
>>[R 116]  All elements with a cardinality of 1..n, (and lack a 
>>qualifying
>>structure) must be contained by a list container named  "(name of
repeating
>>element)List", which has a cardinality of 1..1.
>>
>>These are the new rules agreed upon during the teleconference call on 
>>9 July.  These are voted as approved, just need polishing up.  To 
>>remind everybody, here is the motion and it was approved.
>>
>>***Motion:(Arofan) We agree in the direction of the rules being 
>>submitted, a. Endorse the direction as indicated in this proposal.
>>
>>b. Authorize Arofan to make the changes that were discussed in this 
>>meeting.
>>
>>Changes:
>>
>>Substitute the word "Top" for "Head",
>>
>>Make sure we have explicitly covers the 1..n in the wording.
>>
>>c. Authorize Mark to make editorial changes.
>>
>>d. Submit to list for final approval. (vote by email)
>>
>>******
>>Proposed full set of rules, as discussed:
>>
>>----------------------------------------------------------------------
>>------
>>----
>>
>>(1) All non-repeatable BIEs that are direct children of the 
>>document-level BIE in the model will be child elements of a generated 
>>"Top" element in the schema. The generated "Top" element will be named

>>"[doctype]Top", and its content model will be a sequence. It will 
>>reference a generated type named "[doctype]TopType". Both the 
>>generated "Top" element and its type will be declared in the same 
>>namespace as the document-level element. (Note: This rule implies that

>>all documents will have generated "Top" elements, without exception, 
>>regardless of their other 'body' contents, to cover cases where the 
>>document will be extended with the Context mechanism, and for general
>>consistency.)
>>
>>(2) All repeatable BIEs in the model will have generated containers. 
>>The containers will be named "[name_of_repeatable_element]List". These

>>containers will be required if the cardinality of their contained 
>>immediate children requires at least one; if their contained children 
>>are optional; the container itself will be optional. At least one of 
>>the repeatable children of the List will always be required, but there

>>may be more than one required child if that agrees with the 
>>cardinality found in the business model.
>>
>>All "_____List" elements will reference a "_______ListType", which 
>>will be declared in the same namespace as the element that represents 
>>the repeatable BIE in the business model. The content model of this 
>>type will have a single child element, which will have a maximum 
>>occurrence that reflects the maximum occurrence in the business model,

>>and a minimum occurrence as described in this rule, above.
>>
>>(NOTE: This rule applies equally to 'list' containers at the document 
>>level, and also at lower levels within the document.)
>>
>>(3) The document element in the schema will have a content model that 
>>is a sequence of elements, the first of which will be the "Top" 
>>element, and the others will be the generated "List" elements, in the 
>>order in which their contained, repeatable children appeared in the 
>>model.
>>
>>(4) All elements in the generated schema that are direct children of 
>>the generated "top" elements in all documents should be gathered 
>>together into a common aggregate type, named "TopType", which will be 
>>declared in the Common Aggregate Types namespace. This type should be 
>>declared abstract, and all document headers should be extensions - 
>>even if only trivial extensions to facilitate re-naming - of this 
>>abstract type. (Note: This rule allows for polymorphic processing of 
>>the set of generic header elements across all document types.)
>>
>>
>>---
>>Outgoing mail is certified Virus Free.
>>Checked by AVG anti-virus system (http://www.grisoft.com).
>>Version: 6.0.498 / Virus Database: 297 - Release Date: 7/8/2003
>>
>>
>>
>>---
>>
>>File has not been scanned
>>
>>Checked by AVG anti-virus system (http://www.grisoft.com).
>>Version: 6.0.498 / Virus Database: 297 - Release Date: 7/8/2003
>>


You may leave a Technical Committee at any time by visiting
http://www.oasis-open.org/apps/org/workgroup/ubl-ndrsc/members/leave_wor
kgroup.php