OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

egov message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [egov] Re: Need advice regarding XML performance issues


Michael:

Performance is a very loaded term.  We have had huge debates on this 
going back to 1996 on  the XML-dev list.  I will try to recall some of 
the points we agreed on.

1. Performance is affected largely by platform, programming language and 
physical memory (*both heap and stack)
2. Sax is an Event based model.  I have a PPT slide that explains the 
concept of SAX very clearly at
http://www.nickull.net/presentations.html. (download the one entitled 
Washington - Day Three).  Sax works by reading in an XML document as a 
one dimensional stream of bytes.  When an enough bytes are read that an 
"event" is recognized, an event notice is dispatched up the stack.  The 
event notices are simple text messages that look something like this

StartElement=["foo"];

The above event is the parsers way of telling the parent that 
instantiated it that is has encountered a start element named "foo". 
 Once the event has been dispatched.  No residual memory of the event is 
kept.  This makes SAX a preferable methodology for parsing when there 
are strict memory requirement.

Since XML does not contain any semantics, a parser is simply a reader. 
 Nothing is done with the XML except reading it, checking it for errors 
and resolving entities (three mandatory items) and a fourth optional 
item of validating it against a DTD or XMl Schema.  The latter also 
slows down parsing.
 
The Java SAX implementation (Xerces), accordingly has four main handlers 
(entity resolver, error handler, Validation handler and event handler. 
 It is up to the programmer to capture all the events that get passed up 
and do something meaningful with them. *** This is the place where a lot 
of performance can be gained or lost!!!  Since just about all programs 
that consume XML documents will eventually do something with them, the 
skill of the programmer writing the handler code greatly affects things 
like memory, speed etc.  If you use a language like Java with automatic 
garbage collection, your memory options are managed for you however you 
can still tune it further.  If you work in a language like C or C++ 
(ANSI), the skill of the programmer is going to affect your systems 
performance.

3. If one requires to keep a model of the XML document and run a series 
of programmatic tests against it, you will likely use the DOM.  DOM 
(Document Object Model) works by accepting the events from the SAX 
handler (* although use of sax is not mandatory) and building an in 
memory representation of the original XML document.  Tests and queries 
can then be run against the DOM tree to test for certain conditions, 
etc.  Performance is greatly affected here by what kinds of tests you 
will run against your XML tree.  This is a point of contention for those 
who advocate XML automatically written out from a model since not all 
object models will result in XML that is efficient to query.  IMHO - a 
balance has to be struck between the modellers requirements and the 
programmers/system administrators.  Anyways, XML like this:

<root>
  <tag one/>
  <tag two/>
  <tag three/>
</root>

will be easier on processor speed that this:

<root>
  <tag one>
     <tag two>
        <tag three/>
     </tag two>
     <tag two>
         <tag three/>
     ...

if you are iterating through a deep tree looking for matches.

Summary:

I have studied the performance issues for a lot of years and will attest 
that it is an extremely complex issue and the truths about it change 
almost monthly as parsers are upgraded, new chipsets come out, new API's 
to the O/S are used, newer versions of garbage collection (Java, C#) are 
invented etc.  To be up to date is almost impossible however there are a 
simple set of rules that can probably get you 85% of the way there.

If you are eager to delve into this subject more thoroughly, please 
contact me offline and I can provide you with some links etc.

Cheers

Duane Nickull





John.Borras@e-Envoy.gsi.gov.uk wrote:

>
> TC Members
>
> Can anyone provide any pointers or advice to Michael please?  
>
> If there are any commercial sensitivities about your advice then I'll 
> leave you to negotiate directly with him for providing that advice.
>
> John
>
>
> "Mike Hughes" <mwhughes@sandproof.org>
>
> 31/12/2003 21:12
>
> To
> <john.borras@e-envoy.gsi.gov.uk>
> cc
>
> Subject
> Need advice regarding XML performance issues
>
>
>
>
>
>
>
>
> Dear Mr. Borras,
>  
> I am researching performance issues related to XML and need to speak 
> with an expert.  I understand that you Chairman of the OASIS 
> e-Government Technical Committee..  
>  
> I would appreciate it if you could reply to this email with the names 
> of people who could provide related input, particular with regard to 
> specific standards and conditions that affect performance very 
> adversely.  Also, I need to understand what measures, commercial or 
> standards-related, are being taken to resolve such performance problems.
>  
> Thank you for any assistance you can provide.
>  
> Sincerely,
>  
> Michael W. Hughes
> Amplicast
> Erie, CO
> USA
>
>
> PLEASE NOTE: THE ABOVE MESSAGE WAS RECEIVED FROM THE INTERNET.
>
> On entering the GSI, this email was scanned for viruses by the 
> Government Secure Intranet (GSI) virus scanning service supplied 
> exclusively by Cable & Wireless in partnership with MessageLabs.
>
> GSI users see 
> http://www.gsi.gov.uk/main/notices/information/gsi-003-2002.pdf for 
> further details. In case of problems, please call your organisational 
> IT helpdesk.
>

-- 
Senior Standards Strategist
Adobe Systems, Inc.
http://www.adobe.com





[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]