[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: [xslt-conformance] Another lead on HTML output comparison
On an Apache mailing list, someone suggested looking at the "JavaCC HTML Parser" at http://www.quiotix.com/downloads/html-parser/ Quoting that page: "This is a JavaCC grammar for parsing HTML documents. It does not enforce the DTD, but instead builds a simple parse tree which can be used to validate, reformat, display, analyze, or edit the HTML document. The goal was to produce a parse tree which threw away very little information contained in the source file, so that by dumping the parse tree, an almost identical copy of the input document would result. The only source information discarded by the parser is whitespace inside of tags (i.e., the spaces or newlines between the attributes of a tag.) It is not confused by things that look like tags inside of quoted strings." Maybe we could build on that foundation? .................David Marston
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC