egov message

Subject: Re: [egov] Re: Need advice regarding XML performance issues

From: Sean McGrath <sean.mcgrath@propylon.com>
To: egov@lists.oasis-open.org
Date: Mon, 05 Jan 2004 14:29:30 +0000

At 09:05 05/01/2004 -0500, Chiusano Joseph wrote:
>Michael,
>
>Thanks for your inquiry. As I'm sure you're very much aware, the term
>"performance" is a very broad term, and can involve both hardware and
>software. In terms of XML, one consideration is the technique used to
>parse an XML document - for example, if one uses W3C DOM (Document
>Object Model) - which is a "pull" parser, they *may* possibly experience
>a lesser degree of performance then if, for example, they use a "push"
>parser such as SAX (Simple API for XML).
>
>If you would like to convey more specifics about your system and what
>you are doing, and as much information as possible regarding products,
>parsing techniques, file sizes, etc., then I believe you'll be able to
>receive a response based on your specific situation.

A technical (but I think important) point. (I'm happy to discuss further 
offline if anybody wishes to follow up.)

The distinction between DOM and SAX from a performance perspective is 
related to memory usage patterns, not push/pull.

DOM is not a pull parser, indeed it is not a parser at all, neither is SAX 
strictly speaking. DOM and SAX are models for interacting with XML 
documents. SAX tends to be so intertwined with parsing that the two are 
often indistinguisable. However, DOM is more easily separated from the 
parsing process and indeed, many DOM implementations use SAX to get access 
to XML documents.

SAX works by interacting with your application whilst the document is being 
processed. SAX is memory efficient but restricts your visibility of the 
document to the parts already processed. DOM interaction on the other hand, 
commences when the document has been processed in its entirety to create a 
(typically) memory based model of a document giving you full visibility of 
its component parts. You can see the start of the document, move to the end 
of the document, step backwards etc.

 From a performance perspective, the effort involved in building a tree 
structure (which is what DOM needs) can be expensive and it is often the 
case the amount of memory you have limits the size of documents you can 
process in a complete DOM-ish way.

SAX on the other hand, uses a finite amount of memory and so can chew 
through documents many times bigger than available resources. The downside 
is that you do not have the full access to the document structure that DOM 
gives you.

A very important question to ask yourself is "how much of a document 
structure do I need to have in memory at any one time"? A lot follows 
depending on how you answer that question! If you are a Word user you might 
think about it the way you think about master documents or (better), if you 
are familiar with Adobe Framemaker, the "chapter model" it uses to deal 
with large documents.

regards,

Sean McGrath
http://seanmcgrath.blogspot.com

References:
- Re: Need advice regarding XML performance issues
  - From: John.Borras@e-Envoy.gsi.gov.uk
- Re: [egov] Re: Need advice regarding XML performance issues
  - From: "Chiusano Joseph" <chiusano_joseph@bah.com>