[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [egov] Re: Need advice regarding XML performance issues
Michael: Performance is a very loaded term. We have had huge debates on this going back to 1996 on the XML-dev list. I will try to recall some of the points we agreed on. 1. Performance is affected largely by platform, programming language and physical memory (*both heap and stack) 2. Sax is an Event based model. I have a PPT slide that explains the concept of SAX very clearly at http://www.nickull.net/presentations.html. (download the one entitled Washington - Day Three). Sax works by reading in an XML document as a one dimensional stream of bytes. When an enough bytes are read that an "event" is recognized, an event notice is dispatched up the stack. The event notices are simple text messages that look something like this StartElement=["foo"]; The above event is the parsers way of telling the parent that instantiated it that is has encountered a start element named "foo". Once the event has been dispatched. No residual memory of the event is kept. This makes SAX a preferable methodology for parsing when there are strict memory requirement. Since XML does not contain any semantics, a parser is simply a reader. Nothing is done with the XML except reading it, checking it for errors and resolving entities (three mandatory items) and a fourth optional item of validating it against a DTD or XMl Schema. The latter also slows down parsing. The Java SAX implementation (Xerces), accordingly has four main handlers (entity resolver, error handler, Validation handler and event handler. It is up to the programmer to capture all the events that get passed up and do something meaningful with them. *** This is the place where a lot of performance can be gained or lost!!! Since just about all programs that consume XML documents will eventually do something with them, the skill of the programmer writing the handler code greatly affects things like memory, speed etc. If you use a language like Java with automatic garbage collection, your memory options are managed for you however you can still tune it further. If you work in a language like C or C++ (ANSI), the skill of the programmer is going to affect your systems performance. 3. If one requires to keep a model of the XML document and run a series of programmatic tests against it, you will likely use the DOM. DOM (Document Object Model) works by accepting the events from the SAX handler (* although use of sax is not mandatory) and building an in memory representation of the original XML document. Tests and queries can then be run against the DOM tree to test for certain conditions, etc. Performance is greatly affected here by what kinds of tests you will run against your XML tree. This is a point of contention for those who advocate XML automatically written out from a model since not all object models will result in XML that is efficient to query. IMHO - a balance has to be struck between the modellers requirements and the programmers/system administrators. Anyways, XML like this: <root> <tag one/> <tag two/> <tag three/> </root> will be easier on processor speed that this: <root> <tag one> <tag two> <tag three/> </tag two> <tag two> <tag three/> ... if you are iterating through a deep tree looking for matches. Summary: I have studied the performance issues for a lot of years and will attest that it is an extremely complex issue and the truths about it change almost monthly as parsers are upgraded, new chipsets come out, new API's to the O/S are used, newer versions of garbage collection (Java, C#) are invented etc. To be up to date is almost impossible however there are a simple set of rules that can probably get you 85% of the way there. If you are eager to delve into this subject more thoroughly, please contact me offline and I can provide you with some links etc. Cheers Duane Nickull John.Borras@e-Envoy.gsi.gov.uk wrote: > > TC Members > > Can anyone provide any pointers or advice to Michael please? > > If there are any commercial sensitivities about your advice then I'll > leave you to negotiate directly with him for providing that advice. > > John > > > "Mike Hughes" <mwhughes@sandproof.org> > > 31/12/2003 21:12 > > To > <john.borras@e-envoy.gsi.gov.uk> > cc > > Subject > Need advice regarding XML performance issues > > > > > > > > > Dear Mr. Borras, > > I am researching performance issues related to XML and need to speak > with an expert. I understand that you Chairman of the OASIS > e-Government Technical Committee.. > > I would appreciate it if you could reply to this email with the names > of people who could provide related input, particular with regard to > specific standards and conditions that affect performance very > adversely. Also, I need to understand what measures, commercial or > standards-related, are being taken to resolve such performance problems. > > Thank you for any assistance you can provide. > > Sincerely, > > Michael W. Hughes > Amplicast > Erie, CO > USA > > > PLEASE NOTE: THE ABOVE MESSAGE WAS RECEIVED FROM THE INTERNET. > > On entering the GSI, this email was scanned for viruses by the > Government Secure Intranet (GSI) virus scanning service supplied > exclusively by Cable & Wireless in partnership with MessageLabs. > > GSI users see > http://www.gsi.gov.uk/main/notices/information/gsi-003-2002.pdf for > further details. In case of problems, please call your organisational > IT helpdesk. > -- Senior Standards Strategist Adobe Systems, Inc. http://www.adobe.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]