xslt-conformance message

Subject: Thinking output comparing output
From: David_Marston@lotus.com
To: XSLT-Conformance@lists.oasis-open.org
Date: Fri, 22 Sep 2000 18:05:44 -0400
Carmelo sent me an email in which he mentioned some dilemmas relating to
comparing output of a test run against the prescribed output. As I was
thinking about presenting my take on the issues, I decided to send this to
the whole list. (I have also sent this to my colleague Shane Curcuru, who
thinks about this a lot on behalf of Lotus. He may send along his
thoughts.)

The possible outputs of interest from a test case can be broken down in a
tree-like structure. I had earlier proposed that each test be annotated
with a "scenario" parameter to describe the framework in which it should be
run, which mostly means the inputs and outputs. Additional "output"
parameters would name the particular outputs that must be compared, and
there could be more than one. This is in Part 4 "Operational Parameters" of
my memo entitled "Test Case Markup, Straw Man edition" sent on 9/5/2000.

To some extent, this discussion gets into areas that we may choose to leave
up to the discretion of the test labs that use our suite. I don't believe
we have complete precision on the location of that dividing line. We should
go beyond the line in this analysis to insure that submitters provide
adequate information with their tests.

Here's the tree of output possibilities:

THE INTERESTING OUTPUT IS THE TRANSFORMATION RESULT
In all cases in this group, the easy approach is to send output to a file,
but that introduces a certain amount of post-XSLT processing into the test.
We could add some questions about "serialization to a file" to our
questionnaire for developers, or perhaps have a mini-suite of tests that
reveal the details of that serialization.

-1- XML output
We could apply XML Canonicalization (See http://www.w3.org/TR/xml-c14n for
latest draft) and then do byte-wise comparison. If the processor supports
SAX events as a form of output, one could avoid many of the file-output
issues by just responding to those events, but how do you introduce the
representation of the correct output? Similarly, one could try to verify a
static in-memory representation of the output.

-2- HTML output
I suspect there are tools that tell you if two HTML files will produce the
exact same appearance when viewed in a browser, though I can't name any. We
also need to check correct generation on invisible items like comments,
leading back to many of the same issues as with XML.

-3- Text output
This output can vary so much that some form of byte-wise comparison seems
inevitable. There could be line-ending issues when the output is a file.

-4- Future: several outputs of mixed type
The W3C Working Group favors formalization of multiple output documents,
with varying formats among them. See http://www.w3.org/TR/xslt11req and
keep in mind that this is for 1.1, so it's coming soon. Thus, any scheme
for designating the type of output (and hence the method of output
comparison) should be flexible enough to allow multiple outputs of multiple
types.

THE INTERESTING OUTPUT IS ACTIVITY IN THE OPERATING SYSTEM
-5- Processor raises error
The first question is how automatic this should be. Detecting the fact of
an error drags in variables such as the operating system and/or the
language in which the processor was implemented. Capturing the error
message and isolating the interesting portion might also involve operating
system specifics. And do we dare to address the content of the messages?
The person who is choosing a processor based on the results of this test
battery could very well want to know how well the error messages lead to
the problem, but that's not conformance. At a minimum, I think that
submitters of deliberate-error test cases should include a sample message
text that is as specific as possible.

-6- Processor sends "message" somewhere
This provision is specifically for tests of xsl:message, which is granted
wide discretion about where the message goes. Information about where it
goes and how it's serialized would have to be solicited on the developer
questionnaire. But the test case could specify an exact byte stream (except
perhaps for line-ending characters) that should be emitted.

-7- Both of the above
This, too, applies to xsl:message, when you are testing the terminate
option. This could be a separate scenario or may simply arise from
specifying multiple expected outputs as a blend of -5- and -6- above.

THE INTERESTING OUTPUT IS BOTH
At this time, I'm not convinced that we need to explore this too heavily.
Given that both of the main branches above have the potential for multiple
outputs, we cover this area adequately if we don't create clashes of
notation.

ANOTHER ISSUE: Carmelo speculated about storing the output stream(s)
directly in the test catalog. I think that raises numerous operational
difficulties, such as when a lab tries to transform the catalog to use its
data in a test harness. He also mentioned having a pointer to a file
containing the correct output. I think that having the correct output in a
file would still allow forms of comparison other than file-to-file, if the
harness developer so decides.