ubl-lcsc message

Subject: RE: [ubl-lcsc] Processing Efficiency Test WAS Re: Position Paper on List Containers
From: Chin Chee-Kai <cheekai@softml.net>
To: A Gregory <agregory@aeon-llc.com>
Date: Tue, 2 Sep 2003 00:30:54 +0800 (SGT)
Up front, I'd like to say 3 things before I address each of your
points:

(1) My model-theoretic numbers are technology neutral.
    In other words, they are applicable regardless of whether you
    are using Saxon/Xalan, or C++, C, other Java, Perl, XSLT or
    no XSLT, 300MHz or 3.0GHz machines.  This is because the
    model-theoretic numbers give expected numbers on the incoming
    side, and pitch each application's performance against itself
    operating on different structures and values.

    Those numbers, independent of technologies used in
    implementation, are telling us (or perhaps I should say
    me and not include you) that the claims on efficiency
    quoted cannot be supported unless one prays very hard
    that the documents are going to be very very structurally
    benign, which won't and cannot be the case (quantification
    of "very very" is found in my previous email).

(2) Your presentation about the tests and performance numbers
    cannot be claimed to be representative of the whole.

    I like to say I'm glad you share your steps performed and
    values obtained, as that is what can be most convincing in
    bringing us to obtain certain conclusions.  So while I
    fully agree that you could do more tests with varying variables,
    just one or a few tests cannot prove or disprove anything.

    I guess I'm disappointed that the whole UBL TC is led to argue
    for containership for so long on the grounds of just a few
    test cases.  They can be a start of an entire suite of
    what might be very convincing results, if that should happen,
    but until it is proven and conclusively presented, one cannot
    prove by extension by saying "2 is a prime number, 3 is a prime
    number, therefore all integers are prime numbers".

(3) Zooming in on your test methodology, there is an implicit
    assumption on the use of XSLT.  This is not in the same
    breath of meaning as DOM-based processing, which doesn't
    require XSLT.  Your assumption that XSLT is the default
    processing mechanism could be a practical one (based on
    free software, open specs, many people to discuss with,
    etc), but it is certainly not the only way nor a normative
    requirement of UBL to MUST use XSLT when it comes to
    data processing and transformation.

    Furthermore, on only the configuration you mentioned
    (Saxon, XSLT on your notebook), we don't know your notebook's
    CPU model, speed, cache size, RAM memory size and what
    XPaths you used in XSLT.  XPaths are extremely powerful
    expressions that when differently expressed ever so slightly,
    can result in very magnified performance differences.



> I guess you were not party to the original discussion, which is much
> narrower than the scope of your response. It is based on the behavior
> (including my experience with optimizations) that is entirely limited to
> your (C) below.

That's good and bad;  good because I can be a fresh listener, but
bad because I've understood the background to the presented argument
about performance gains, and cannot find new evidence to support
the claims about containership performance benefits.



> Assume a document has been received and parsed/validated
> as XML. There are, in my opinion, too many schemes to handle cross-nodal
> and business logic validation to particularly design for them: we must
> assume they are equal in all scenarios.

I don't quite understand what is meant by "cross-nodal".  But if
you mean processing with multiple UBL documents "on-hand" within
an application, I need to highlight from little bit of work done
that shouldn't need any special mention that one can't run away
from dealing with cross-document data transformations in what
might be a limited real-life scenario.


> The process efficiencies for containers are derived from the ubiquity of
> DOM processing as seen in common XSLT processors (Saxon and Xalan) which
> typically do not require a schema to perform their transformation
> functions. (I won't go into the way in which cross-nodal logic can be
> implemented as an XSLT transformation using schematron, but you may be
> familiar with this approach. I don't know how prevalent this is, but
> neither is it germane to the argument.) I cannot speak to the current
> implementations of other DOM processors, but I suspect that they will
> behave in similar fashion - I may be wrong about this, however, and so
> will not argue this - it would really depend on optimization, as you
> point out.

    I suppose you might have meant it as a short form of
    expression when you mixed DOM, XSLT and Saxon all in a breath.

    As you know, DOM is just a model of data access built over
    an abstract XML tree.  XSLT is a tranformation technology
    that was initially really meant for presentational transformation.
    It can be realized based directly on the internal abstract
    XML tree, or over a layer of DOM constructs (which will incur
    further performance cost but gaining DOM accessibility interface).
    And Saxon is just one form of application that implements
    XSLT.

    Each of DOM, XSLT and Saxon introduces its own performance
    penalties due to different reasons.  In your timing numbers,
    there's no break down of the 460 milliseconds, which portion
    is attributable to the time needs of each layer, and
    which portion is due entirely to the structure of the instance.
    In other words, the latter would then really argue for you
    in terms of container benefits.

    I don't mean to criticize the exercise as I think some
    timing numbers are better than none and given that all of us
    are busy.  However, assuming we are all interested to dig
    to the bottom of truth, and I'd want to support containers
    if numbers really argue for themselves, I think we cannot base
    a conclusion of what might be programming delays (e.g. poor
    implementation of loops), internal data structure inefficiencies
    (e.g. no use of hashtables to cash already "hit" nodes),
    poor programming constructs (e.g. lack of good use of macros
    over functions), poor memory management (e.g. always relying
    on garbage collection) etc etc.



> I don't want to argue this ad nauseum, either, but I believe that there
> is a real processing efficiency here for large documents.

I'd welcome the claim if there're real numbers to support.
But sorry that so far, I've only seen claim statements and
inconclusive timings based on a few samples.  I don't think
one should say conclusively based on just that.



> I guess the simple way to find out is to take a 1000+-item PO with and
> without containers, and see how long it takes to perform an XSLT
> transformation in identical circumstances (in this case, on my laptop
> and using Saxon).

You cannot, because based on the argument put forth earlier,
the claimed advantage was when "the other" non-recurring nodes
overwhelms recurring nodes.  I've just shown the list that the
upperbound numbers that ensure that you have that condition to
make containers "useful".  And that upperbound given is about 3
in the best-case-argument for containers (again, regardless
of technology and implementations used).

When it goes beyond that, and that there's practical requirement
to process all nodes, then container element itself is dwarfed
by the 1000+ items and "the other nodes" that their presence/
absence can be easily seen as to lead to no conclusive
performance gains to speak of.

You also cannot REQUIRE use of XSLT.  This becomes a
competition on "cleverness" to implement XSLT, and is different
from processing UBL instances at stage (C) (based on my previous
layering model of processing).   This inherent requirement of XSLT
as a normative form of comparison cannot bring good to
implementors.

Arguments based on REQUIREment of XSLT (and Saxon for that matter)
in processing therefore cannot be used to support proposed
containership rules, unless further results show processing
benefits for UBL instances that are independent of technologies
used (and are in harmony with model-theoretic expected figures).




> Note that the following numbers are based *only* on
> the inclusion of a header-level container and a list of line items
> container. I have not gone through and included all of the rules
> suggested by NDR, but only the two containers specified (I don't have
> time, sadly.)
>
> The XSLT I used grabbed one header item (the company name out of the
> Buyer Party) and made a simple HTML out of it:
>
> <html><p>[buyername]</p></html>
>
> Thus, we are only processing 1 XPath here.
>
> My results:
> With Containers: 460 milliseconds
> Without Containers: 470 milliseconds
>
> Results are a net savings of 10 milliseconds when containers are used
> for an XSLT that makes only a single match.
>
> (Admittedly, this is anything but a comprehensive test, but you will
> find that your average XSLT process does a lot more than a single
> lookup.)
>
> OK - big deal - I can demonstrate a processing difference of 10
> milliseconds out of a total processing time of under 500 - a bit better
> than 2%. I suspect that this effect could be multiplied by the number of
> XPath tests in the stylesheet.
>
> So I continue my test with some more XPaths, to see if I am right. I add
> 3 more XPaths to my stylesheet: one more "header" call, and two calls
> into the line items:
>
> With Containers: 460 milliseconds
> Without Containers: 510 milliseconds

Ok, this is one step towards clarity, but not sufficient as
I mentioned that XSLT/XPath is only ONE way, and your configuration
of stylesheet/XPath/Saxon/Your-notebook is only ONE of the ONE ways
of doing it.




> Now we see my suspicion above borne out: we are looking at a processing
> efficiency on the order of 10%. I would argue that this is significant.

Can you really attribute the full 10% due to purely structural
differences about presence/absence of containers?

Can you be absolutely certain that some internal short
optimizations didn't take place within that particular
implementation of Saxon/XSLT/XPath that led to one instance
of quickened timing, but that the same can safely be said if
the complexity gets higher?




> And, presumably, the more XPaths we add, the greater the efficiency gain
> will grow.

Not yet.  "2 is a prime, 3 is a prime" doesn't prove that
all integers are prime numbers.  I'm impressed with your
boldness to claim such.



> Note that there was 0 difference in prep time and in the time required
> to build the trees (this was the same both with and without containers).

Can't be zero difference, simply because if you peek into the
internals of Saxon, time will be required to minimally allocate
structures for the extra container elements and to "walk over"
it and into its children.  The difference is probably too small
for milisecond precision that you use to time the Saxon performance,
but the difference CAN be amplified when an instance has many
containers containing only 1 or 2 elements.

And since you ignored the Stage (B) schema parsing, you've
essentially ignored the time penalty that must necessarily be
incurred during that phase.  That penalty is expected to be
much more measureable (larger), because a schema-validator has
two sets of nodes to operate on, one set has just the instance
itself, and the other set is the set of all UBL schemas now
proliferated with the many many container types.




> The hit was taken in pure processing time. Thus, the addition of a
> couple of tags was minor - the processing penalty far outweighs it.

See above, not conclusive that the delay observed was due
entirely on presence/absence of container elements.




> Now, we still must measure the relative importance of 10% processing
> efficiency for just a transformation using XSLT, and I will confess that
> I have not addresses the full scope of your response. But - since I
> don't believe we have the resources to do comprehensive testing - I
> still think my claims of significant efficiency gains in typical
> processing scenarios with large document are borne out, all other things
> being equal.

Neither do I have much resources to do either.  But as much as I've
disliked the idea right up front due to some of the number I could
foresee, I hate to leave the burden of coming up with containered
schemas entirely to Tim to bear.  So I did have to end up spending
some working and weekend time to work on container schemas.  What
I didn't know that it was just to prove/disprove extrapolation
arguments based on a few tests though.



Best Regards,
Chin Chee-Kai
SoftML
Tel: +65-6820-2979
Fax: +65-6743-7875
Email: cheekai@SoftML.Net
http://SoftML.Net/
References:
- RE: [ubl-lcsc] Processing Efficiency Test WAS Re: Position Paper on List Containers
  - From: "A Gregory" <agregory@aeon-llc.com>