[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [ubl-lcsc] Processing Efficiency Test WAS Re: Position Paperon List Containers
Greetings I've just tried a quick test. I used instant saxon to run a very simple transformation with two draft 8 invoice instances both with 1000 lines - test.xml being the no-containers version and test1.xml being exactly the same but with InvoiceLineList as a container of the invoice lines (no time spent ensuring this was valid as I took saxon to be schema-agnostic). My stylesheet was just extracting the buyer party name into a simple piece of HTML. The output below does show a 50% reduction in processing time with the single list container for the invoice lines. I only ran it a few times because there was no doubt about the figures. I've attached the stylesheet and both xml instances. Now of course it still leaves the matter of how much importance should be placed on processing speed in comparison to other considerations when processing speeds are constantly improving. My main concern remains the possibility that remains in my mind of reduced reusability with the containers - but I for one don't need further convincing of the rather surprising improvement in performance exactly as Arofan stated. All the best Stephen Green Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\UBL-Schema-0.81-draft-8-Classic>saxon -t test.xml test.xsl SAXON 6.5.2 from Michael Kay Java version 1.1.4 Preparation time: 157 milliseconds Processing file:/C:/UBL-Schema-0.81-draft-8-Classic/test.xml Bilding tree for file:/C:/UBL-Schema-0.81-draft-8-Classic/test.xml using class com.icl.saxon.tinytree.TinyBuilder Tree built in 407 milliseconds <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body></body> </html>Execution time: 547 milliseconds C:\UBL-Schema-0.81-draft-8-Classic>saxon -t test1.xml test.xsl SAXON 6.5.2 from Michael Kay Java version 1.1.4 Preparation time: 157 milliseconds Processing file:/C:/UBL-Schema-0.81-draft-8-Classic/test1.xml Building tree for file:/C:/UBL-Schema-0.81-draft-8-Classic/test1.xml using class com.icl.saxon.tinytree.TinyBuilder Tree built in 79 milliseconds <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body></body> </html>Execution time: 219 milliseconds C:\UBL-Schema-0.81-draft-8-Classic>saxon -t test.xml test.xsl SAXON 6.5.2 from Michael Kay Java version 1.1.4 Preparation time: 156 milliseconds Processing file:/C:/UBL-Schema-0.81-draft-8-Classic/test.xml Building tree for file:/C:/UBL-Schema-0.81-draft-8-Classic/test.xml using class com.icl.saxon.tinytree.TinyBuilder Tree built in 359 milliseconds <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body></body> </html>Execution time: 500 milliseconds C:\UBL-Schema-0.81-draft-8-Classic>saxon -t test.xml test.xsl SAXON 6.5.2 from Michael Kay Java version 1.1.4 Preparation time: 157 milliseconds Processing file:/C:/UBL-Schema-0.81-draft-8-Classic/test.xml Building tree for file:/C:/UBL-Schema-0.81-draft-8-Classic/test.xml using class com.icl.saxon.tinytree.TinyBuilder Tree built in 359 milliseconds <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body></body> </html>Execution time: 485 milliseconds C:\UBL-Schema-0.81-draft-8-Classic>saxon -t test.xml test.xsl SAXON 6.5.2 from Michael Kay Java version 1.1.4 Preparatin time: 1859 milliseconds Processing file:/C:/UBL-Schema-0.81-draft-8-Classic/test.xml Building tree for file:/C:/UBL-Schema-0.81-draft-8-Classic/test.xml using class com.icl.saxon.tinytree.TinyBuilder Tree built in 359 milliseconds <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body></body> </html>Execution time: 500 milliseconds C:\UBL-Schema-0.81-draft-8-Classic>saxon -t test.xml test.xsl SAXON 6.5.2 from Michael Kay Java version 1.1.4 Preparation time: 156 milliseconds Processing file:/C:/UBL-Schema-0.81-draft-8-Classic/test.xml Building tree for file:/C:/UBL-Schema-0.81-draft-8-Classic/test.xml using class com.icl.saxon.tinytree.TinyBuilder Tree built in 359 milliseconds <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body></body> </html>Execution time: 484 milliseconds C:\UBL-Schema-0.81-draft-8-Classic>saxon -t test1.xml test.xsl SAXON 6.5.2 from Michael Kay Java version 1.1.4 Preparation time: 157 milliseconds Processing file:/C:/UBL-Schema-0.81-draft-8-Classic/test1.xml Building tree for file:/C:/UBL-Schema-0.81-draft-8-Classic/test1.xml using class com.icl.saxon.tinytree.TinyBuilder Tree built in 94 milliseconds <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body></body> </html>Execution time: 203 milliseconds C:\UBL-Schema-0.81-draft-8-Classic>saxon -t test1.xml test.xsl SAXON 6.5.2 from Michael Kay Java version 1.1.4 Preparation time: 156 milliseconds Processing file:/C:/UBL-Schema-0.81-draft-8-Classic/test1.xml Building tree for file:/C:/UBL-Schema-0.81-draft-8-Classic/test1.xml using class com.icl.saxon.tinytree.TinyBuilder Tree built in 94 milliseconds <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body></body> </html>Execution time: 203 milliseconds C:\UBL-Schema-0.81-draft-8-Cassic>saxon -t test1.xml test.xsl SAXON 6.5.2 from Michael Kay Java version 1.1.4 Preparation time: 156 milliseconds Processing file:/C:/UBL-Schema-0.81-draft-8-Classic/test1.xml Building tree for file:/C:/UBL-Schema-0.81-draft-8-Classic/test1.xml using class com.icl.saxon.tinytree.TinyBuilder Tree built in 78 milliseconds <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body></body> </html>Execution time: 219 milliseconds C:\UBL-Schema-0.81-draft-8-Classic>saxon -t test1.xml test.xsl SAXON 6.5.2 from Michael Kay Java version 1.1.4 Preparation time: 156 milliseconds Processing file:/C:/UBL-Schema-0.81-draft-8-Classic/test1.xml Building tree for file:/C:/UBL-Schema-0.81-draft-8-Classic/test1.xml using class com.icl.saxon.tinytree.TinyBuilder Tree built in 78 milliseconds <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body></body> </html>Execution time: 218 milliseconds C:\UBL-Schema-0.81-draft-8-Classic>Chin Chee-Kai <cheekai@softml.net> 09/01/03 17:51 PM >>>Up front, I'd like to say 3 things before I address each of your points: (1) My model-theoretic numbers are technology neutral. In other words, they are applicable regardless of whether you are using Saxon/Xalan, or C++, C, other Java, Perl, XSLT or no XSLT, 300MHz or 3.0GHz machines. This is because the model-theoretic numbers give expected numbers on the incoming side, and pitch each application's performance against itself operating on different structures and values. Those numbers, independent of technologies used in implementation, are telling us (or perhaps I should say me and not include you) that the claims on efficiency quoted cannot be supported unless one prays very hard that the documents are going to be very very structurally benign, which won't and cannot be the case (quantification of "very very"is found in my previous email). (2) Your presentation about the tests and performance numbers cannot be claimed to be representative of the whole. I like to say I'm glad you share your steps performed and values obtained, as that is what can be most convincing in bringing us to obtain certain conclusions. So while I fully agree that you could do more tests with varying variables, just one or a few tests cannot prove or disprove anything. I guess I'm disappointed that the whole UBL TC is led to argue for containership for so long on the grounds of just a few test cases. They can be a start of an entire suite of what might be very convincing results, if that should happen, but until it is proven and conclusively presented, one cannot prove by extension by saying "2 is a prime number, 3 is a prime number, therefore all integers are prime numbers". (3) Zooming in on your test methodology, there is an implicit assumption on the use of XSLT. This is not in the same breath of meaning as DOM-based processing, which doesn't require XSLT. Your assumption that XSLT is the default processing mechanism could be a practical one (based on free software, open specs, many people to discuss with, etc), but it is certainly not the only way nor a normative requirement of UBL to MUST use XSLT when it comes to data processing and transformation. Furthermore, on only the configuration you mentioned (Saxon, XSLT on your notebook), we don't know your notebook's CPU model, speed, cache size, RAM memory size and what XPaths you used in XSLT. XPaths are extremely powerful expressions that when differently expressed ever so slightly, can result in very magnified performance differences.I guess you were not party to the original discussion, which is much narrower than the scope of your response. It is based on the behavior (including my experience with optimizations) thatis entirely limited to your (C) below.That's good and bad; good because I can be a fresh listener, but bad because I've understood the background to the presented argument about performance gains, and cannot find new evidence to support the claims about containership performance benefits.Assume a document has been received and parsed/validated as XML. There are, in my opinion, too many schemes to handle cross-nodal and business logic validation to particularly design for them: we must assume they are equal in all scenarios.I don't quite understand what is meant by "cross-nodal". But if you mean processing with multiple UBL documents "on-hand" within an application, I need to highlight from little bit of work done that shouldn't need any special mention that one can't run away from dealing with cross-document data transformations in what might be a limited real-life scenario.The process efficiencies for containers are derived from the ubiquity of DOM processing as seen in common XSLT processors (Saxon and Xalan) which typically do not require a schema to perform their transformation functions. (I won't go into the way in which cross-nodal logic can be implemented as an XSLT transformation using schematron, but you may be familiar with this approach. I don't know how prevalent this is, but neither is it germane to the argument.) I cannot speak to the current implementations of other DOM processors, but I suspect that they will behave in similar fashion - I may be wrong about this, however, and so will not argue this - it would really depend on optimization, as you point out.I suppose you might have meant it as a short form of expression when you mixed DOM, XSLT and Saxon all in a breath. As you know, DOM is just a model of data access built over an abstract XML tree. XSLT is a tranformation technology that was initially really meant for presentational transformation. It can be realized based diectly on the internal abstract XML tree, or over a layer of DOM constructs (which will incur further performance cost but gaining DOM accessibility interface). And Saxon is just one form of application that implements XSLT. Each of DOM, XSLT and Saxon introduces its own performance penalties due to different reasons. In your timing numbers, there's no break down of the 460 milliseconds, which portion is attributable to the time needs of each layer, and which portion is due entirely to the structure of the instance. In other words, the latter would then really argue for you in terms of container benefits. I don't mean to criticize the exercise as I think some timing numbers are better than none and given that all of us are busy. However, assuming we are all interested to dig to the bottom of truth, and I'd want to support containers if numbers really argue for themselves, I think we cannot base a conclusion of what might be programming delays (e.g. poor implementation of loops), internal data structure inefficiencies (e.g. no use of hashtables to cash already "hit" nodes), poor programming constructs (e.g. lack of good use of macros over functions), poor memory management (e.g. always relying on garbage collection) etc etc.I don't want to argue this ad nauseum, either, but I believe that there is a real processing efficiency here for large documents.I'd welcome the claim if there're real numbers to support. But sorry that so far, I've only seen claim statements and inconclusive timings based on a few samples. I don't think one should say conclusively based on just that.I guess the simple way to find out is to take a 1000+-item PO with and without containers, and see how long it takes to perform an XSLT transformation in identical circumstances (in this case, on my laptop and using Saxon).You cannot, because based on the argument put forth earlier, th claimed advantage was when "the other" non-recurring nodes overwhelms recurring nodes. I've just shown the list that the upperbound numbers that ensure that you have that condition to make containers "useful". And that upperbound given is about 3 in the best-case-argument for containers (again, regardless of technology and implementations used). When it goes beyond that, and that there's practical requirement to process all nodes, then container element itself is dwarfed by the 1000+ items and "the other nodes" that their presence/ absence can be easily seen as to lead to no conclusive performance gains to speak of. You also cannot REQUIRE use of XSLT. This becomes a competition on "cleverness" to implement XSLT, and is different from processing UBL instances at stage (C) (based on my previous layering model of processing). This inherent requirement of XSLT as a normative form of comparison cannot bring good to implementors. Arguments based on REQUIREment of XSLT (and Saxon for that matter) in processing therefore cannot be used to support proposed containership rules, unless further results show processing benefits for UBL instances that are independent of technologies used (and are in harmony with model-theoretic expected figures).Note that the following numbers are based *only* on the inclusion of a header-level container and a list of line items container. I have not gone through and included all of the rules suggested by NDR, but only the two containers specified (I don't have time, sadly.) The XSLT I used grabbed one header item (the company name out of the Buyer Party) and made a simple HTML out of it: <html><p>[buyername]</p></html> Thus, we are only processing 1 XPath here. My results: With Containers: 460 milliseconds Without Containers: 470 milliseconds Results are a net savings of 10 milliseconds when containers are used for an XSLT that makes only a single match. (Admittedly, this is aything but a comprehensive test, but you will find that your average XSLT process does a lot more than a single lookup.) OK - big deal - I can demonstrate a processing difference of 10 milliseconds out of a total processing time of under 500 - a bit better than 2%. I suspect that this effect could be multiplied by the number of XPath tests in the stylesheet. So I continue my test with some more XPaths, to see if I am right. I add 3 more XPaths to my stylesheet: one more "header" call, and two calls into the line items: With Containers: 460 milliseconds Without Containers: 510 millisecondsOk, this is one step towards clarity, but not sufficient as I mentioned that XSLT/XPath is only ONE way, and your configuration of stylesheet/XPath/Saxon/Your-notebook is only ONE of the ONE ways of doing it.Now we see my suspicion above borne out: we are looking at a processing efficiency on the order of 10%. I would argue that this is significant.Can you really attribute the full 10% due to purely structural differences about presence/absence of containers? Can you be absolutely certain that some internal short optimizations didn't take place within that particular implementation of Saxon/XSLT/XPath that led to one instance of quickened timing, but that the same can safely be said if the complexity gets higher?And, presumably, the more XPaths we add, the greater the efficiency gain will grow.Not yet. "2 is a prime, 3 is a prime" doesn't prove that all integers are prime numbers. I'm impressed with your boldness to claim such.Note that there was 0 difference in prep time and in the time required to build the trees (this was the same both with and without containers).Can't be zero difference, simply because if you peek into the internals of Saxon, time will be required to minimally allocate structures for the extra container elements and to "walk over" it and into its children. The difference is probaby too small for milisecond precision that you use to time the Saxon performance, but the difference CAN be amplified when an instance has many containers containing only 1 or 2 elements. And since you ignored the Stage (B) schema parsing, you've essentially ignored the time penalty that must necessarily be incurred during that phase. That penalty is expected to be much more measureable (larger), because a schema-validator has two sets of nodes to operate on, one set has just the instance itself, and the other set is the set of all UBL schemas now proliferated with the many many container types.The hit was taken in pure processing time. Thus, the addition of a couple of tags was minor - the processing penalty far outweighs it.See above, not conclusive that the delay observed was due entirely on presence/absence of container elements.Now, we still must measure the relative importance of 10% processing efficiency for just a transformation using XSLT, and I will confess that I have not addresses the full scope of your response. But - since I don't believe we have the resources to do comprehensive testing - I still think my claims of significant efficiency gains in typical processing scenarios with large document are borne out, all other things being equal.Neither do I have much resources to do either. But as much as I've disliked the idea right up front due to some of the number I could foresee, I hate to leave the burden of coming up with containered schemas entirely to Tim to bear. So I did have to end up spending some working and weekend time to work on container schemas. What I didn't know that it was just to prove/disprove extrapolation arguments based on a few tests though. Best Regards, Chin Chee-Kai SoftML Tel: +65-6820-2979 Fax: +65-6743-7875 Email: cheekai@SoftML.Net http://SoftML.Net/ To unsubscribe from this mailing list (and be removed from the roster of the OASIS TC), go to http://www.oasis-open.org/apps/or/workgroup/ubl-lcsc/members/leave_workgroup.php.
To unsubscribe from this mailing list (and be removed from the roster of the OASIS TC), go to http://www.oasis-open.org/apps/org/workgroup/ubl-lcsc/members/leave_workgroup.php.
-- regards tim mcgrath phone: +618 93352228 postal: po box 1289 fremantle western australia 6160
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]