xslt-conformance message

Subject: some draft on infoset based comparison
From: Kirill Gavrylyuk <kirillg@microsoft.com>
To: xslt-conformance@lists.oasis-open.org
Date: Wed, 18 Apr 2001 10:47:03 -0700
Here is a draft at the state it was a month ago. Will follow up on this
soon.

-----Original Message-----
From: Kirill Gavrylyuk 
Sent: Friday, February 23, 2001 6:35 PM
To: 'G. Ken Holman'
Subject: RE: Draft drawing


Thank you very much, Ken!
While writing up the process couple of questions turns to me for HTML
and text outputs. Seems like text is doable whereas HTML comparison is
hardsome. Below is a draft of ideas. 
------------------------------
	xsl:output="XML" case: Criteria is based on comparison of subset
of Infoset of the expected and actual documents that relevant for XSLT
output comparison. For example this subset will not have prefixes, will
not have entities information, etc...  
	Committee gives format for XML-based description of such subset
of Infoset (call it T' format). This subset is accessible from XPath, so
Committee gives XSLT stylesheet for T'-transformation. 
	Committee gives xml input and expected output documents
transformed to T' format. 
	Test lab that does actual testing, will receive actual output
from the XSLT processor being tested, and transform it to T' format
using the same XSLT processor being tested.
	Then it is up to test lab how to compare T'-form of actual and
expected output. It can be canonicalised to remove attribute order
problem or can be just loaded to DOM and use consistent proprietary
output  - whatever. But any tests failures should be reported based on
difference between T'-forms of actual and expected output in terms of
Infoset elements.
	This way we removed any dependancy on specific parser or
platform. The only weak point is that if the test fails it might be
actual test failure or failure to produce T' transform. This is planned
to be resolved by a set of "smoke" tests targeting "Is parser testable
at all".
-----------------------------------	

	 xsl:output="html" case: Have problem here sofar. We could go
with the same idea as in "xml" case, e.g. to build infoset description
of html output. There are couple of problems though:
		-HTML document generally is not loadable into any
xml/xslt parser due to reduced forms of empty elements (<br/> -> <br> ),
boolean attributes in minimized form, case insesitivity of key
elements(<br> or <BR>), characters not excaped in <script>,  etc...
		-Control delimiters like <HTML> or <BODY> might be added
even if not specified in input element
		- other issues.
	these issues could be addressed by some specific half-textual
parsing coupled with xml parsing, but this gives us dependancy on
specific code/platform. I've heared that there are licensed software
that compares HTML documents on equivalence in terms of visibility on
both IE and Netscape - though it won't help in terms of <script>
element. I can't even reduce it to text as spaces and attributes order
are at discretion of XSLT processor output.
------------------------------------

	 xsl:output="text" case:  The simplest idea is to give lab so
called I' form of text code - that means document in UTF-8 with separate
text node for example for each line with actual encoding specified as
attribute of the root. 
	More interesting approach we could go is to use the fact that
result of text output is the combination of all text nodes from xml
output in the document order. So we can reduce conformance test for
"text" output to only testing this statement of the spec.

Thanks, Kirill!