ubl-ndrsc message

Subject: RE: [ubl-ndrsc] Rule: 115 and 116 Containers
From: "Burcham, Bill" <Bill_Burcham@stercomm.com>
To: 'Eduardo Gutentag' <Eduardo.Gutentag@Sun.COM>
Date: Thu, 17 Jul 2003 18:22:13 -0500
Your counterexample, Eduardo, is double-bogus since in the first place your
two docs carry different content, and in the second, your second doc won't
validate under any scheme I've heard proposed in UBL, since the (repeated)
Fruit elements have elements stuck between 'em.

Would this pair have made my example stronger:

<?xml version="1.0" encoding="UTF-8"?>
 <Groceries>
	<SuperfluousFruitContainer>
		<Fruit>Apple</Fruit>
		<Fruit>Orange</Fruit>
		<Fruit>Banana</Fruit>
	</SuperfluousFruitContainer>
      <SuperfluousVegetableContainer>
            <Vegetable>Celery</Vegetable>
            <Vegetable>Lettuce</Vegetable>
      </SuperfluousVegetableContainer>
 </Groceries>

<?xml version="1.0" encoding="UTF-8"?>
 <Groceries>
	<Fruit>Apple</Fruit>
	<Fruit>Orange</Fruit>
	<Fruit>Banana</Fruit>
      <Vegetable>Celery</Vegetable>
      <Vegetable>Lettuce</Vegetable>
 </Groceries>

All my previous arguments hold equally for these two as well.

And you may be right -- that the first stylesheet could be cut in half.  And
I suspect that if you found such a transformation, you could pretty much
apply it to the second stylesheet and cut it in half too.  There just isn't
much difference between the two approaches when it comes to XSLT.

-----Original Message-----
From: Eduardo Gutentag [mailto:Eduardo.Gutentag@Sun.COM] 
Sent: Thursday, July 17, 2003 6:09 PM
To: Burcham, Bill
Cc: 'Chin Chee-Kai'; UBL-NDR
Subject: Re: [ubl-ndrsc] Rule: 115 and 116 Containers


Bill, I think your argument is bogus.

The alternative to

<?xml version="1.0" encoding="UTF-8"?>
<doc>
	<SuperfluousContainer>
		<Fruit>Apple</Fruit>
		<Fruit>Orange</Fruit>
		<Fruit>Banana</Fruit>
	</SuperfluousContainer>
</doc>

is not, in real life,

<?xml version="1.0" encoding="UTF-8"?>
<doc>
	<Fruit>Apple</Fruit>
	<Fruit>Orange</Fruit>
	<Fruit>Banana</Fruit>
</doc>

but more probably

<?xml version="1.0" encoding="UTF-8"?>
<doc>
	<someelement>foo</somelement>
	<Fruit>Apple</Fruit>
	<anotherone>bar</anotherone>
	<Fruit>Orange</Fruit>
	<alongcontainerlikeaddress>
              <a>
                 <b>
                    <c>foo</c>
                 </b>
               </a>
         </alongcontainerlikeaddress>
	<Fruit>Banana</Fruit>
</doc>

Also, although I don't have the time or the inclination of checking this
out, (I am on vacation after all) I believe your first stylesheet is way
more complicated than needed for dealing with the container case, I believe
it can be cut in half -- but again, I have not checked this, it's just based
on previous experience with stylesheets.

Burcham, Bill wrote:
> I'm with Chee-Kai -- I think [R 116] is wrong.  (I know it's probably 
> too late -- but I'm gonna say my peace anyway :-) The two cases I've 
> heard made in favor of it are:
> 
> 1. container elements foster more readable stylesheets
> 2. container elements significantly improve document processing 
> performance
> 
> Argument 1 is weak.  Forgive me for posting working code, but here is 
> an instance document with superfluous containers:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <doc>
> 	<SuperfluousContainer>
> 		<Fruit>Apple</Fruit>
> 		<Fruit>Orange</Fruit>
> 		<Fruit>Banana</Fruit>
> 	</SuperfluousContainer>
> </doc>
> 
> And here is a stylesheet to process it:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:transform version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> 	<xsl:output method="xml" version="1.0" encoding="UTF-8" 
> indent="yes"/>
> 	<xsl:template match="doc">
> 		<xsl:element name="NewDoc">
> 			<xsl:apply-templates select="current()/*"/>
> 		</xsl:element>
> 	</xsl:template>
> 	<xsl:template match="SuperfluousContainer">
> 		<BeforeFruit/>
> 		<xsl:apply-templates select="current()/*"/>
> 		<AfterFruit/>
> 	</xsl:template>
> 	<xsl:template match="Fruit">
> 		<AFruit>
> 			<xsl:value-of select="text()"/>
> 		</AFruit>
> 	</xsl:template>
> </xsl:transform>
> 
> And here is the output:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <NewDoc>
> 	<BeforeFruit/>
> 	<AFruit>Apple</AFruit>
> 	<AFruit>Orange</AFruit>
> 	<AFruit>Banana</AFruit>
> 	<AfterFruit/>
> </NewDoc>
> 
> The example injects an element before the first fruit and after the 
> last one.  That's the example we've been discussing for a couple years 
> as being the bugaboo here.
> 
> And here is an analogous source instance doc -- this time with no 
> superfluous containers:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <doc>
> 	<Fruit>Apple</Fruit>
> 	<Fruit>Orange</Fruit>
> 	<Fruit>Banana</Fruit>
> </doc>
> 
> And here is a different stylesheet to process this one:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:transform version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> 	<xsl:output method="xml" version="1.0" encoding="UTF-8" 
> indent="yes"/>
> 	<xsl:template match="doc">
> 		<xsl:element name="NewDoc">
> 			<xsl:apply-templates select="current()/*"/>
> 		</xsl:element>
> 	</xsl:template>
> 	<xsl:template match="Fruit">
> 		<xsl:if test="position() = 1">
> 		<BeforeFruit/>
> 		</xsl:if>
> 		<AFruit>
> 			<xsl:value-of select="text()"/>
> 		</AFruit>
> 		<xsl:if test="position() = last()">
> 		<AfterFruit/>		
> 		</xsl:if>
> 	</xsl:template>
> </xsl:transform>
> 
> Comparing the two stylesheets I note that the one for superfluous 
> containers is 19 lines and the one for repeating elements (with no 
> superfluous
> containers) is 20 lines.  That's only one line of code difference.  And I
> don't think the second stylesheet is any less readable than the first.
> 
> If I look at the two source documents, and extrapolate to larger 
> documents with more nesting I can say with certainty that superfluous 
> containers make for larger documents and IMHO are a bit harder for 
> humans to read -- do to the increase in indentation necessitated by 
> the deeper hierarchy.
> 
> As for point 2 (processing performance), that's just Voodoo Computer 
> Science.  So, which XML processing tools are we using for comparison?  
> Which versions of those tools?  What is the 
> use-case/scenario/algorithm?  How big is the document?  Worst-case, if 
> you tell me that the document is HUGE then I'll tell you a) the 
> Bolivian rug-weaver using Perl as the processing tool isn't gonna see 
> the HUGE document and b) the company (Wal*Mart) that sees the HUGE 
> document can darn-well write a transform on the incoming document (or 
> four or five transforms) that make it more amenable to efficient 
> processing.
> 
> But you know what -- I still haven't seen any real _evidence_ that 
> superfluous containers provide any processing performance advantage in 
> the first place.  It's more likely they hurt performance since they 
> _definitely_ make documents larger!
> 
> So by my count, it's:
> 
> Superfluous containers:  they make documents bigger (inflicting a 
> processing
> burden) and harder for humans to read
> Repeated elements (no superfluous containers): they make documents smaller
> and easier for humans to read, and necessitate a tiny bit more XSLT code
in
> some situations.
> 
> Down with [R 116]!
> 
> 
> Bill Burcham
> Sr. Software Architect, Integration Software Development Sterling 
> Commerce, Inc. 469.524.2164
> bill_burcham@stercomm.com
> 
> -----Original Message-----
> From: Chin Chee-Kai [mailto:cheekai@softml.net]
> Sent: Wednesday, July 16, 2003 8:38 PM
> To: UBL-NDR
> Subject: Re: [ubl-ndrsc] Rule: 115 and 116 Containers
> 
> 
> 
>>>[R 115]  All documents shall have a container for metadata  and which
>>>proceeds the body of the document and is named  "Head" _____________. 
>>>(anything but header)
> 
> 
>>>[R 116]  All elements with a cardinality of 1..n, (and lack a
>>>qualifying
>>>structure) must be contained by a list container named  "(name of
> 
> repeating
> 
>>>element)List", which has a cardinality of 1..1.
> 
> 
> I remain critical of having to maintain such virtual structure for no 
> apparent use.  I've heard that the rules don't affect FPSC at all.  By 
> design, they should not affect LC.  So who's benefiting from carrying 
> all the empty luggages around?
> 
> 
> That said, I pointed out last time that the [R 115] should have 
> "precedes" instead of "proceeds", unless the proponent of the rule 
> wants Head sitting at the tail.
> 
> 
> 
> Best Regards,
> Chin Chee-Kai
> SoftML
> Tel: +65-6820-2979
> Fax: +65-6743-7875
> Email: cheekai@SoftML.Net
> http://SoftML.Net/
> 
> 
> On Wed, 16 Jul 2003, Lisa-Aeon wrote:
> 
> 
>>>Rules for Voting:  Each email will have only one rule in it, I will
>>>try to mark the rules that group with it, or rules that might 
>>>duplicate it.  The membership has 5 working days to bring forth 
>>>objection or discussion, after the 5 working days, if there are no 
>>>objections, the rule will be assumed to be "ACCEPTED" and be given to 
>>>the LCSC for their implementation.
>>>
>>>Please Reply leaving first email in Reply.
>>>
>>>Voting period on this rule ends:  July 23, 2003
>>>
>>>*******************************
>>>I am combining the last two rules, because we have already voted on a
>>>decision.  These are the old rules:
>>>
>>>[R 115]  All documents shall have a container for metadata  and which
>>>proceeds the body of the document and is named  "Head" _____________. 
>>>(anything but header)
>>>
>>>[R 116]  All elements with a cardinality of 1..n, (and lack a
>>>qualifying
>>>structure) must be contained by a list container named  "(name of
> 
> repeating
> 
>>>element)List", which has a cardinality of 1..1.
>>>
>>>These are the new rules agreed upon during the teleconference call on
>>>9 July.  These are voted as approved, just need polishing up.  To 
>>>remind everybody, here is the motion and it was approved.
>>>
>>>***Motion:(Arofan) We agree in the direction of the rules being
>>>submitted, a. Endorse the direction as indicated in this proposal.
>>>
>>>b. Authorize Arofan to make the changes that were discussed in this
>>>meeting.
>>>
>>>Changes:
>>>
>>>Substitute the word "Top" for "Head",
>>>
>>>Make sure we have explicitly covers the 1..n in the wording.
>>>
>>>c. Authorize Mark to make editorial changes.
>>>
>>>d. Submit to list for final approval. (vote by email)
>>>
>>>******
>>>Proposed full set of rules, as discussed:
>>>
>>>---------------------------------------------------------------------
>>>-
>>>------
>>>----
>>>
>>>(1) All non-repeatable BIEs that are direct children of the
>>>document-level BIE in the model will be child elements of a generated 
>>>"Top" element in the schema. The generated "Top" element will be named 
>>>"[doctype]Top", and its content model will be a sequence. It will 
>>>reference a generated type named "[doctype]TopType". Both the 
>>>generated "Top" element and its type will be declared in the same 
>>>namespace as the document-level element. (Note: This rule implies that 
>>>all documents will have generated "Top" elements, without exception, 
>>>regardless of their other 'body' contents, to cover cases where the 
>>>document will be extended with the Context mechanism, and for general
>>>consistency.)
>>>
>>>(2) All repeatable BIEs in the model will have generated containers.
>>>The containers will be named "[name_of_repeatable_element]List". These 
>>>containers will be required if the cardinality of their contained 
>>>immediate children requires at least one; if their contained children 
>>>are optional; the container itself will be optional. At least one of 
>>>the repeatable children of the List will always be required, but there 
>>>may be more than one required child if that agrees with the 
>>>cardinality found in the business model.
>>>
>>>All "_____List" elements will reference a "_______ListType", which
>>>will be declared in the same namespace as the element that represents 
>>>the repeatable BIE in the business model. The content model of this 
>>>type will have a single child element, which will have a maximum 
>>>occurrence that reflects the maximum occurrence in the business model, 
>>>and a minimum occurrence as described in this rule, above.
>>>
>>>(NOTE: This rule applies equally to 'list' containers at the document
>>>level, and also at lower levels within the document.)
>>>
>>>(3) The document element in the schema will have a content model that
>>>is a sequence of elements, the first of which will be the "Top" 
>>>element, and the others will be the generated "List" elements, in the 
>>>order in which their contained, repeatable children appeared in the 
>>>model.
>>>
>>>(4) All elements in the generated schema that are direct children of
>>>the generated "top" elements in all documents should be gathered 
>>>together into a common aggregate type, named "TopType", which will be 
>>>declared in the Common Aggregate Types namespace. This type should be 
>>>declared abstract, and all document headers should be extensions - 
>>>even if only trivial extensions to facilitate re-naming - of this 
>>>abstract type. (Note: This rule allows for polymorphic processing of 
>>>the set of generic header elements across all document types.)
>>>
>>>
>>>---
>>>Outgoing mail is certified Virus Free.
>>>Checked by AVG anti-virus system (http://www.grisoft.com).
>>>Version: 6.0.498 / Virus Database: 297 - Release Date: 7/8/2003
>>>
>>>
>>>
>>>---
>>>
>>>File has not been scanned
>>>
>>>Checked by AVG anti-virus system (http://www.grisoft.com).
>>>Version: 6.0.498 / Virus Database: 297 - Release Date: 7/8/2003
>>>
> 
> 
> 
> You may leave a Technical Committee at any time by visiting 
> http://www.oasis-open.org/apps/org/workgroup/ubl-ndrsc/members/leave_w
> orkgroup.php
> 

-- 
Eduardo Gutentag               |         e-mail: eduardo.gutentag@Sun.COM
Web Technologies and Standards |         Phone:  +1 510 550 4616 x31442
Sun Microsystems Inc.          |
W3C AC Rep / OASIS TAB Chair


You may leave a Technical Committee at any time by visiting
http://www.oasis-open.org/apps/org/workgroup/ubl-ndrsc/members/leave_workgro
up.php