xdi message

Subject: XDI parsing speed

From: Markus Sabadello <markus.sabadello@xdi.org>
To: OASIS - XDI TC <xdi@lists.oasis-open.org>
Date: Sat, 14 Aug 2010 01:46:06 -0700

Hello,

I did some benchmarking of my parsing code for XDI serialization formats.
As a basis I used the PdxExample: http://wiki.oasis-open.org/xdi/PdxExample in formats X3 Simple, X3 Standard, XDI/JSON (the old one) and XDI/XML.

The basic result is that about 90-95% of the parsing time goes into parsing the XRIs in the graph.
The remaining 5-10% go into parsing the graph itself.

The XRI parsing code had been auto-generated from the XRI 3.0 ABNF, using the library aParse: http://www.parse2.com/.
This is a generic ABNF library that is obviously not optimized for XRI 3.0 ABNF, therefore the resulting XRI parser is slow.

Joseph&Drummond&I have been discussing this, here are some ideas:
- Manually write a more efficient parser that could handle most of the cases, and only invoke the full-featured parser for more complex XRIs.
- Have a cache of already parsed XRIs, in case the same XRIs appear multiple times in a graph.
- Use "delayed parsing", i.e. treat XRIs like strings until you really need the internal semantics, in which case you do invoke the parser. The problem with this approach is that an XDI server may end up accepting and storing XDI that is actually invalid.
- Try to manually optimize some obvious inefficiencies in the automatically generated code, e.g. by pre-compiling regexes, avoiding wasteful string allocation, etc.

Markus