[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: some draft on infoset based comparison
Here is a draft at the state it was a month ago. Will follow up on this soon. -----Original Message----- From: Kirill Gavrylyuk Sent: Friday, February 23, 2001 6:35 PM To: 'G. Ken Holman' Subject: RE: Draft drawing Thank you very much, Ken! While writing up the process couple of questions turns to me for HTML and text outputs. Seems like text is doable whereas HTML comparison is hardsome. Below is a draft of ideas. ------------------------------ xsl:output="XML" case: Criteria is based on comparison of subset of Infoset of the expected and actual documents that relevant for XSLT output comparison. For example this subset will not have prefixes, will not have entities information, etc... Committee gives format for XML-based description of such subset of Infoset (call it T' format). This subset is accessible from XPath, so Committee gives XSLT stylesheet for T'-transformation. Committee gives xml input and expected output documents transformed to T' format. Test lab that does actual testing, will receive actual output from the XSLT processor being tested, and transform it to T' format using the same XSLT processor being tested. Then it is up to test lab how to compare T'-form of actual and expected output. It can be canonicalised to remove attribute order problem or can be just loaded to DOM and use consistent proprietary output - whatever. But any tests failures should be reported based on difference between T'-forms of actual and expected output in terms of Infoset elements. This way we removed any dependancy on specific parser or platform. The only weak point is that if the test fails it might be actual test failure or failure to produce T' transform. This is planned to be resolved by a set of "smoke" tests targeting "Is parser testable at all". ----------------------------------- xsl:output="html" case: Have problem here sofar. We could go with the same idea as in "xml" case, e.g. to build infoset description of html output. There are couple of problems though: -HTML document generally is not loadable into any xml/xslt parser due to reduced forms of empty elements (<br/> -> <br> ), boolean attributes in minimized form, case insesitivity of key elements(<br> or <BR>), characters not excaped in <script>, etc... -Control delimiters like <HTML> or <BODY> might be added even if not specified in input element - other issues. these issues could be addressed by some specific half-textual parsing coupled with xml parsing, but this gives us dependancy on specific code/platform. I've heared that there are licensed software that compares HTML documents on equivalence in terms of visibility on both IE and Netscape - though it won't help in terms of <script> element. I can't even reduce it to text as spaces and attributes order are at discretion of XSLT processor output. ------------------------------------ xsl:output="text" case: The simplest idea is to give lab so called I' form of text code - that means document in UTF-8 with separate text node for example for each line with actual encoding specified as attribute of the root. More interesting approach we could go is to use the fact that result of text output is the combination of all text nodes from xml output in the document order. So we can reduce conformance test for "text" output to only testing this statement of the spec. Thanks, Kirill!
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC