xslt-conformance message

Subject: Halfway to genericizing the test case catalog
From: David_Marston@lotus.com
To: xslt-conformance@lists.oasis-open.org
Date: Fri, 29 Jun 2001 17:31:20 -0400
Below is the generic version of the design for catalogs of test cases. As
you can see, the document is now written for future OASIS committees to
customize. The next step is for Ken to show us how he would deal with the
parts that need to be customized for each type of processor testing regime.
Quoting Ken's message of June 12th: "I propose we genericize the value to
simple CDATA and have a supplemental expression of what attribute values or
value pairs (or other) are allowed in the XSLT/XPath use of the test
suite." All such parts are marked with !*! below.

Before our July 11th meeting, I hope to have a galvanized Iron Man which
will be either (1) Ken's modified version of what's below plus the
"supplemental expression" that produces the form customized for XSLT, or at
least (2) a mild revision of the June 11th Iron Man incorporating the outer
<test-catalog Title="xxx"> element. I broadened the verbiage about setting
parameters, and the galvanized Iron Man would be likewise changed. (Keep in
mind that the parameter-setting mechanism, Question 10 of our design
questions from January, is one we chose to postpone.) I'm also considering
making Date an attribute hanging directly off the test-case element, as the
category does--would this make automated insertion any harder?

Ken's prior feedback also included this statement: "My personal convention
regarding the attribute/element religion is that #PCDATA is *reserved* for
human-readable text and attributes are *reserved* for machine-readable
text, and I coerce my models to meet these very strict constraints." My
religion says that with XML, especially the way we use it, it's hard to say
when some information is really exclusively for human use. However, the
reason I have retained most data in elements is more pragmatic: the
multiple instances of compound data, such as multiple spec-citations, can
be more easily handled as elements. I think the machine has greater needs
for structured compound data than does the human. Readers are invited to
submit their own "religious" statements, but what we really want are
rational reasons. Dublin Core can be read either way, since they use
"attribute" and "element" interchangeably when referring to their data
items.

I didn't have time to investigate the metallic properties of germanium; I
just chose it as the closest word to "generic" among the elements. If any
reader is a closet Materials Scientist, please advise about appropriate
substances for the analogy.
.................David Marston

Test Case Markup & Cataloging, Germanium Man edition

This document is the "Iron Man" document adapted for generic use. Parts to
be genericized are delimited by !*! markers. Even in its generic form, this
catalog should only be considered suitable for testing of a unitary piece
of software, like a processor, that takes inputs and produces outputs
according to a specified invocation. An example of what it does not handle
would be a dialog between two systems, with each system being put into
different states as the dialog progresses.

To produce a design for a particular instance of processor testing, one
must adapt this document as follows:
A. Produce a union list of all the normative documents that affect
conformance testing of this type of processor, and give them short names.
(Check whether earlier OASIS committee work has already set some short
names. Re-use is good!)
B. List all the scenarios for running test cases and comparing output
against the "correct" output. Assign names for "operation" and "compare"
aspects of the scenarios.
C. Decide whether there is a default assumption about the input files. For
example, in testing an XSLT processor, we assume that the inputs are
<Source>.xsl (the stylesheet) and <Source>.xml (the data) unless the
catalog data dictates otherwise. If no default rule exists, then the
input-file element of the scenario becomes required.
D. Decide whether there is a default assumption about the outputs. For
example, in testing an XSLT processor, we assume that the output is a
single file, of varying type according to the scenario, unless the catalog
data dictates otherwise. If no default rule exists, then the output-file
element of the scenario becomes required.
E. OASIS chooses a set of categories, if desired. The "category" attribute
may be removed if categories won't be used.
F. Identify all areas in which the specs grant discretionary choices to
processor developers. (There is some pressure for W3C Working Groups to do
this as part of producing their Recommendations.) Catalog the available
choices in each area.
G. Identify all gray areas in the specs as best you can. Catalog the
available choices in each area.

This document describes information that should be associated with a test
case for (1) identification, (2) description and mapping to spec
provisions, (3) filtering (choosing whether or not to execute with a given
processor) and discretionary choices, and finally (4) some operational
parameters. Each test case is represented by input files and the
operational parameters to set up all inputs for the particular test case.
The data described below can be accumulated into a catalog of test cases,
in XML of course, with one <test-case> element for each case. However, good
code management practices would probably dictate that the creators of these
cases retain the definitive data in the primary input file. (For XSLT, the
primary input is the stylesheet.) A catalog file can be generated from the
primary inputs. The catalog file would be the definitive version as far as
the OASIS package is concerned. That is, we expect the submitter to provide
a catalog and a file tree of test cases (including allegedly-correct
results), and to coordinate with OASIS on a "Title" for the submission.

Within the catalog, each test is represented as a <test-case> element with
numerous sub-elements. Most parameters would be interpreted as strings.
Values that refer to versions, dates, and the like can be interpreted
numerically, specifically in inequality relations. Excerpts of a potential
DTD are shown.

(1) IDENTIFICATION
The outermost element of a submitted catalog is <test-catalog> with a
"Title" attribute to identify it. This design allows various parties to
contribute test cases and catalogs thereof into an OASIS committee. The
globally-unique "Title" string should also be valid as a directory name in
all prominent operating systems. The title can be suggested by the
submitter, but must be approved by OASIS. Thus, Lotus would submit a test
suite called "Lotus" and the OASIS procedures would load it into a "Lotus"
directory (assuming that the name "Lotus" is acceptable to the OASIS
committee).

<!ELEMENT test-catalog ( test-case* ) >
<!ATTLIST test-catalog Title CDATA #REQUIRED >

A submitted suite can have arbitrary directory structure under its
top-level directory, captured in the "Identifier" element for each case,
with forward slashes as the directory delimiters. The actual name of the
particular file (and test case) would be in the "Source" element, which
should be a valid file name in all prominent operating systems. The
Identifier contains the Source string at the end, but not the Title at the
beginning. Note that the test suite may contain directories that have no
test cases, only utility or subsidiary files.

<!ELEMENT test-case ( Title? , Source , Identifier , Creator* , Date? ,
  purpose , elaboration? , spec-citation+ , discretionary? , gray-area? ,
  scenario ) >
<!-- Dublin Core ("DC") used for convenience/standardization where possible
for meta-data level of this DTD, here we replace FilePath with Identifier,
per http://purl.org/DC/documents/rec-dces-19990702.htm, "example formal
identification systems include the Uniform Resource Identifier (URI)
(including the Uniform Resource Locator (URL))."  Hereafter, quotes within
comments are from the URI above. -->

<!-- DC Title used in place of SuiteName, per "name by which the resource
is
  formally known". This must also meet filename constraints: letter first,
  no spaces, "reasonable" length -->
<!ELEMENT Title ( #PCDATA ) >
<!-- DC Source, per "best practice is to reference the resource by means of
a
  string or number conforming to a formal identification system," but must
  meet filename constraints and have no internal periods. This names a
  single test case. -->
<!ELEMENT Source ( #PCDATA ) >
<!-- Identifier uses forward slashes as separators, begins with the name of
a
  directory that is directly within the top directory named per Title, and
  ends with the name-part in Source. -->
<!ELEMENT Identifier ( #PCDATA ) >

OASIS may bless a particular hierarchical organization of test cases. If
so, then an attribute called "category" should be used to track where the
test fits in OASIS' scheme of categories. That way, OASIS categories will
not dictate the directory structure nor the case names. The goal is that no
case should be marked as belonging to more than one category. A category
named "Mixed" is needed when there isn't a clean partitioning.

<!ATTLIST test-case
  category ( !*!YOUR CATEGORY NAMES!*! | Mixed ) #IMPLIED >

Submitters should be encouraged to use the "Creator" element(s) to name
contributors at the individual-person level. They may also wish to use an
element called "Date" to record, as yyyy-mm-dd, the date stamp on the test
case. That will allow the submitter to match cases with their own source
code management systems, and will likely aid in future updates, either due
to submitter enhancements or W3C changes. OASIS reserves the right to
insert this element, containing the date received, if no value was supplied
by the submitter.

<!-- Dublin Core Creator instead of Author -->
<!ELEMENT Creator ( #PCDATA ) >
<!-- DC/ISO-8601 Date for the date of submission (from creator's POV) -->
<!ELEMENT Date ( #PCDATA ) >

(2) DESCRIPTION AND MAPPING TO SPEC PROVISIONS
Each test must have a "purpose" element whose value describes the point of
the test. This string should be limited in length so that the document
generated by the OASIS tools doesn't ramble too extensively. There would
also be an optional "elaboration" element whose length is unlimited and
which may contain some HTML tags. Nothing in this document should be
construed as discouraging the use of comments elsewhere in the inputs for
clarification.

<!ELEMENT purpose ( #PCDATA ) ><!-- Max 255 characters, no new-lines -->
<!ELEMENT elaboration ANY >

There must be one or more "spec-citation" elements to point at provisions
of the spec that are being tested. Expect that even simple cases will need
several citation elements. The pointing mechanism is the subject of a
separate design. The more exact it is, the less need there is for an
"elaboration" string, and also the better inversion from the spec to the
test cases. The spec-citation element contains a "Rec" attribute to say
which recommendation (XSLT, XPath, etc.), a "Version" sub-element to say
which version thereof, and some form of text pointer. To encourage
submissions before the pointer scheme is final, the Committee may need to
accept alternative sub-elements of different names: <section> for a plain
section number, <doc-frag> for use of fragment identifiers that are already
available in the spec, and <OASISptr1> for the first OASIS pointer scheme,
as seen in the early work of OASIS' XSLT/XPath Conformance TC. OASIS
pointers of types 2 and up may be necessary in the future, hence the
extendable design.

<!-- There must always be at least spec-citation element for the spec that
is the primary subject of the test suite, and optionally other
spec-citation elements can be added as appropriate -->
<!ELEMENT spec-citation ( place , Version , version-drop? , errata-add? ,
errata-drop? ) >
<!ATTLIST spec-citation
  Rec ( !*!YOUR LIST OF NORMATIVE DOCUMENTS!*! ) #REQUIRED >
<!ELEMENT place ( #PCDATA ) ><!-- syntax of content depends on Type -->
<!-- Type is a Dublin Core keyword -->
<!ATTLIST place Type ( section | doc-frag | OASISptr1 ) #REQUIRED ><!--
More pointer types to come? -->

(3) FILTERING AND DISCRETIONARY CHOICES
Each pertinent standard should be cited by version number, but also flagged
as to its errata status, if relevant. The version elements mentioned above
are numeric so that inequality tests may be applied. All tests should use
XML 1.0 if at all possible, but again, we have noted the potential to
specify a higher-numbered version. Any test that is essentially about a
newer spec, such as XBase, should specify the lowest practical level of all
other specs.

<!-- Version is another Dublin Core element; must be numeric -->
<!ELEMENT Version EMPTY >
<!ATTLIST Version number CDATA #REQUIRED >
<!-- version-drop, if specified, must be strictly greater (later) than
Version -->
<!ELEMENT version-drop EMPTY >
<!ATTLIST version-drop number CDATA #REQUIRED >

Errata are independent of newer spec versions, and multiple errata could be
issued per version. The flexible approach is to have a spec-citation
sub-element named "errata-add" that contains a numeric value (0 for the
base document) like the E-number in the XSLT errata; "errata-drop" is
numerically larger and indicates that the test case is no longer pertinent
as of that errata version. However, not all Working Groups are numbering
their errata, so there is some safety in using dates. Date is a required
attribute and should be in ISO-8601 format, which will sort numerically.
The add and drop levels would allow a test case to be marked as being
relevant for errata that later get further clarified. The errata-drop must
always be numerically greater than errata-add. Spec errata parameters need
only be specified where the test applies to a specific erratum, or the base
document only, because they are used for filtering.

<!-- errata-add and errata-drop should be rendered as dates with best
practice
  of ISO 8601, yyyy-mm-dd, W3CDTF. Errata numbers should be put in element
  content, if used at all. -->
<!ELEMENT errata-add ( #PCDATA ) >
<!ATTLIST errata-add Date CDATA #REQUIRED >
<!-- errata-drop, if specified, must be strictly greater (later) than
  errata-add -->
<!ELEMENT errata-drop ( #PCDATA ) >
<!ATTLIST errata-drop Date CDATA #REQUIRED >

When a spec grants discretionary choices to the processor developer, OASIS
should give these choices names if the spec authors did not already do so.
These choices should be encoded in elements which act as excluders when a
test suite is assembled. By serving as excluders, we eliminate the need to
specify all of them in every test case; if a discretionary item is not
mentioned, the test case doesn't care about that item and should be
included for any choice made on that item. The value can be expressed as a
keyword from a set of keywords designated by the Committee. For example,
the <discretionary> <attribute-name-not-QName> element contains a behavior
attribute of either "raise-error" or "ignore" to show that the case should
be excluded when the processor under test made the other choice on this
item. Depending on the choice, there could be parallel tests (differently
named), with distinct parallel "correct output" files, for different values
of the choice, and only one would be selected in any assembly of a test
suite. The questionnaire to developers about discretionary choices may
allow "moot" as a response in some situations, but one cannot use "moot" as
a behavior value in the test case catalog because, as stated above, moot
items are just omitted from the "discretionary" element.

<!ELEMENT discretionary ( discretionary-choice )* >
<!ELEMENT discretionary-choice EMPTY >
<!ATTLIST discretionary-choice name CDATA #REQUIRED behavior CDATA
#REQUIRED>
!*! Where do we validate the set of names? !*!
!*! How do we limit the behaviors allowed on each individual choice? !*!

Vague areas in the spec are handled in the same manner as the discretionary
items above, with <gray-area> substituting for the <discretionary> and the
abbreviated names assigned by the Committee. This is where the errata level
is likely to come in to play, since errata should clear up some vague
areas. Once again, the tester has to ask the developer to answer questions
about their design decisions, and the answers should be encoded using
keywords which can then be matched to the <gray-area> elements. One test
case could serve as both a gray-area for one choice and as the lone case
for errata-add, when that gray-area choice is the one that the errata later
chose.

<!ELEMENT gray-area ( gray-choice )* >
<!ELEMENT gray-choice EMPTY >
<!ATTLIST gray-choice name CDATA #REQUIRED behavior CDATA #REQUIRED>
!*! Where do we validate the set of names? !*!
!*! How do we limit the behaviors allowed on each individual choice? !*!

(4) OPERATIONAL PARAMETERS
At Lotus, we have thought a lot about how comments in the test file can
describe the scenario under which the test is run, though we have not yet
implemented most of the ideas. These parameters describe inputs and
outputs, and a <scenario> element could describe the whole situation
through its "operation" and "compare" attributes. The "operation" value
describes how to run the test, while "compare" describes how to evaluate
the outcome. In the "standard" Operation scenarios, we construct the names
of the inputs from the <Source> element, and output is expected in one file
that could then be suitably compared to the "correct output" file.
"Compare" options include "XML", "HTML", and "Text", corresponding to the
types of output and the possible methods of comparison. One or more
<input-file> and <output-file> elements could be used to specify other
files needed or created, and the values of these elements should permit
relative paths. A single input-file element could be used to specify that
one of the heavily-used standard input files should be retrieved instead of
a test-specific input file. (Lotus has hundreds of tests where the XML
input is just a document-node-only trigger, and we would benefit from
keeping one such file in a Utility directory.) The implication of the
latter rule is that if there exists even one input-file element, no inputs
are assumed and all must be specified.

<!ELEMENT scenario ( input-file* , output-file* , param-set? , console ) >
<!ATTLIST scenario
  operation ( !*!YOUR LIST OF WAYS TO OPERATE!*! ) #REQUIRED
  compare ( !*!YOUR LIST OF OUTPUT TYPES!*! | manual ) #REQUIRED >
<!ELEMENT input-file ( #PCDATA ) >
<!ELEMENT output-file ( #PCDATA ) >

An operation keyword could imply that more or fewer inputs are needed than
in the "standard" operation.

An operation keyword could imply that extra invocation options or
environment settings are needed. The Committee could push responsibility to
the processor developer to provide a script/batch mechanism to take values
from standardized data and map them to the specific syntax of their
processor. The part below shows the connection to the data that the
script/batch mechanism would apply. This is essentially a special-purpose
input file. The most likely formats are:
(1) (type) name=value [new-line delimits?]
(2) a simple XML element with name and type attributes
There should be allowance for simple options, such as a one-word option
that can be present on the command line.

<!-- This needs further design. Assume it designates an input file. -->
<!-- This value is only relevant when the operation keyword is of certain
values -->
<!ELEMENT param-set ANY >

We also want to be able to test that a message was issued (as in
xsl:message) and that an error was issued. The "console" element will be
used to designate strings that must be present in either the standard
output or standard error stream. (The test lab would be responsible for
setting up capture of the console output.) The compare keyword "message"
can designate that, when running this test, capture the standard/error
output into a file, and ignore the file output one would normally check.
The Committee may need compare keywords like "message-HTML" to say that
both the console output and an HTML file must be compared. For console
output, the test of correctness is to grep for the designated string in the
captured output stream. If a tester wished, they could get actual error
message strings from the processor developer and refine the test harness to
search for those exact messages in error output. In that case, the string
in the console element is used as an indirect reference to the actual
string.

<!-- should contain actual error report output string,
  or could be pointer to another file containing such strings.
  Less desirable: description of the problem. -->
<!ELEMENT console ( #PCDATA ) >

A compare value of "manual" would be used sparingly, for output whose
format must meet constraints but whose actual data is only known on the
fly. (Examples: fetch current time, generate random numbers.) Additional
"scenario" keywords can be devised as necessary, but OASIS should control
the naming. The Committee might want to allow names beginning with a
specific letter to be local to particular test labs. For example, we would
reserve all names beginning with "O-" and instruct the test labs that they
should put their name as the next field, then another hyphen, then their
local scenario keywords (e. g., O-NIST-whatever) that allow them to set up
local conditions (e.g., use of APIs) as needed.

HOW IT WORKS
When rendering a specific instance of the test suite, a test case can be
excluded on any one of the following bases:
A discretionary item of a given name is set to a different value.
A gray-area item of a given name is set to a different value.
The spec-citation/Version value on the test case is numerically larger than
what the processor implements. (This could be for any spec named, not just
XSLT.)
There is a spec-citation for a spec (e.g., XBase) that the processor claims
not to implement.
The test lab wishes to test against an errata level that is numerically
lower than the errata-add or higher than the errata-drop for a spec. In the
former case, there should be a cross-check against gray-area items.
Thus, it is the "user" (test lab) who renders a test suite by deciding
which spec version and errata level they wish to test, and by specifying
the settings of the discretionary and gray-area items they know. Before
running the specific rendition, they must ascertain how they will handle
those tests that run in special scenarios, taking into account the
operating system where the tests will run and processor-specific input and
output design.

Note that the test suite itself is not filtered by scenario values. The
test lab may wish to devise a harness that can be configured to exclude
certain scenarios from some runs, but I think we want to encourage testing
and reporting against the full range of scenarios.

When a test case is included, it is run according to the values in the
<scenario> element. If inputs are specified, they are marshalled as
necessary. If no inputs are specified, default rules are applied (if
defined by the Committee). In some scenarios, special steps must be taken
to capture the output. In the standard scenarios, if no outputs are
designated, default rules are applied (if defined by the Committee).