xslt-conformance message

Subject: Re: Halfway to genericizing the test case catalog
From: David_Marston@lotus.com
To: xslt-conformance@lists.oasis-open.org
Date: Fri, 06 Jul 2001 19:14:44 -0400

About date data:
I'm anticipating that submitters will send updated test suites in the
future, possibly for XSLT 2.0. The submitters will have a provision for
tracking the revision date of each individual test case. We only care
about that when we are accepting later submissions and want to avoid
re-reviewing tests that we reviewed before. Therefore, I'm comfortable
to leave supplied dates unmodified.

Ken suggests we may want to apply a date to the whole submission, such as
the date we received it. That's okay with me, but our reasons for wanting
to may actually apply to each individual case, as described above.

More about Identifier:
>><!-- Identifier uses forward slashes as separators, begins with the name
>> of a directory that is directly within the top directory named per
Title,
>> and ends with the name-part in Source. -->
>><!ELEMENT Identifier ( #PCDATA ) >
GKH>If we remove Title, as I think we should, then the comment above would
GKH>change.

I don't think so. The Identifier does not include the Title, so it doesn't
care about the name of the top-level directory of the tree. I picture the
submission coming as a single directory tree of all test cases and other
input files, thus it will have a directory name at the top, but that
doesn't require that we use the top-level name as supplied. But we must
retain the rest of the supplied tree structure, or else we would get into
the business of modifying tests to change things like file paths in
xsl:import directives.

We could relax the requirement for the submitted catalog to have the
Title element, and possibly "category" values for each test, yet impose
on ourselves the requirement that the catalog we ship has them.

Creator data:
>Extending this to the entire suite, I've added a Creator and Date
>children to <test-catalog> to record the information regarding the entire
>collection (again from the submitter's POV).

I discussed Date above. We should be more clear about what would go in a
whole-suite Creator field. I think it wouldn't really be the creator(s)
in the authorship sense, but rather the person who sent it in, in which
case "contact" is a better term. Could a submitter wish to keep their
contact information (email address) hidden? How about calling it
"submitter" and expecting the name of the organization? Or having both
"submitter" and "contact"?

Validating the catalog:
>If we decide that we will still need a validating XML
>processor to validate the structure of our submitted catalogues...

What we need is to ensure that the catalogs can be merged and that
the Test Lab can perform a rendition specific to the test platform and
processor's discretionary choices. I think that means that we have to
either check structure or be responsible for fixing catalog bugs.

Why test cases need to specify discretionary behavior:
>Okay, now during my prototyping I don't see why a test file would specify
>behaviour ... the discretionary document describes possible behaviours,
>not the individual test.

The test case is written to assume one particular choice. For example,
consider a test case that has <xsl:element name=" bad name!"> and comes
with a correct-output file showing the pass-through behavior. It must
only be in the rendered suite when testing a processor that chose the
pass-through option. There may be a parallel test, or an equivalent test
in a different suite, to test the raise-error behavior (the other option)
and it would have catalog data indicating that an error occurs. The
pass-through option must be implemented in a specific way, so it is
possible that a buggy processor will raise an error it didn't intend or
will instantiate the wrong content where the bad element was requested.
Therefore, if the processor under test intends to do pass-through, its
output can be compared against the correct output, and it can pass or
fail that case.

Discretionary contrasted with gray areas:
>Given the possible transient nature of a gray area to a discretionary
>area,

Actually, that very seldom happens. Usually when the WG resolves a gray
area, they specify one behavior as correct. In the xsl:number cases for
our prototype, there was originally a gray area about what to do when a
number is negative, especially concerning A and I formats. The erratum
dictates one behavior. A test case that anticipates this behavior can
be encoded for both this behavior on the gray area and the subsequent
erratum as non-gray. I'm sure it will be a pain! Read on!

>...does it make sense to just call them all discretionary and the
>verbiage associated with each will acknowledge their status?

No! Gray areas will probably be subject to repeated fine-tuning. The set
of discretionary items is fixed (barring errata creating new ones) and
represents conscious intent of the WG. Every gray area is a mistake on the
part of the WG, so naturally there is no complete list of all of them.
The #1 reason to keep them separate is to not pollute the discretionary
list with all the transient junk in the gray list. We can hope that in
future specs, WGs will actually include a normative list of the
discretionary choices as an appendix.

More about operations:
>Regarding "operation", this would force a submitter to constrain
>themselves to what the committee expects to be allowed ...
>I guess that is okay ...

Think of it as environmental variations that are acknowledged in the
spec. It does take careful brain-work from the Committee and/or the
WG writing the spec, but it should be universal. Consider this: in the
ten months (!!) since the original Straw Man came out, nobody has come
forward to identify any XSLT operational scenarios other than the
original three: standard, embedded stylesheet, and parameterized.
Those three can be discerned in the original spec, but we have the
uncomfortable situation that the WG did not produce a list of calling
scenarios. (The spec also allows the processor to check the locale
(internationalization settings) in which it's running in very narrow
circumstances, but I think those are all "should" provisions anyway.)

I think that Ken's uncertainty, expressed above, can be tied in to a
larger question of how far we go in defining Software Quality
Assurance or testing practices. If we don't have a list of calling
scenarios from the WG, and we don't create one, then the users of our
suite have a great hassle in assessing the union of the scenarios that
came in, uncoordinated, from the submitters. Isn't it fair for us to
side with the Test Labs and push the submitters to conform to a fixed
list of scenarios? That encourages them to think about their test cases
as processor-independent conformance tests, which is what we need.

Splitting message compares away from others:
I would like to revisit this question when we have more experience with
the prototype. (As indicated earlier, we may have to make compare be an
attribute of each individual output file, rather than the scenario as a
whole. Look at xsl:document as proposed for XSLT 2.0 and you'll see what
I'm anticipating.) But our catalog is only at the Iron/Germanium stage
now, meaning that it is time to really exercise it. I made message
compares look very similar because I expect the console output to be
captured into a file, but it could be that other specs to be tested
under this regime could have numbered errors or some other precise
expression that the WG creates. Another reason to wait is that we would
benefit from feedback from test labs about how automated they want such
cases to be.

Responsibility for environment preparation:
>>The Committee could push responsibility to the processor developer to
>>provide a script/batch mechanism to take values
>>from standardized data and map them to the specific syntax of their
>>processor.
>Can we not leave this to the testing organization?  I think it is out of
>scope of our committee work.

The above was written as a generic form of the issues about setting
up input. To instantiate for our committee, think about setting
parameters to be passed in as top-level ones. The processor developer is
absolutely responsible for stating how it is done with their product.
However, certain test cases come with parameter-setting data (format
TBD, as we all know). The Test Lab has to take the data that came in
our test suite and transform it into API calls or command-line options
or whatever to set the parameters as needed by the processor under test.
If the Test Lab simply relies upon flimsy documentation, they may get
it wrong in subtle ways on some processors, and those processors may
look worse in the test results because of it. Thus, the developer has an
incentive to ensure that all labs understand how parameters are set for
their product. We, on the other hand, want to ensure that our tests are
repeatable: that different labs working independently will obtain the
exact same results for a given processor and environment when using
the same version of our suite. I'm suggesting by the above verbiage that
a committee can make the vendors aware of their incentives.

Generic terms for OASIS committees:
>>Additional "scenario" keywords can be devised as necessary,
>>but OASIS should control the naming.
>The configuration instance can control that.

We're saying the same thing here. A given Committee using this
design for a test suite looks at their spec and develops a master
list of testing scenarios, both regarding "operation" and "compare"
aspects. They set their configuration instance accordingly.
.................David Marston
Follow-Ups:
- Re: Halfway to genericizing the test case catalog
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>