OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

xslt-conformance message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Notes about name management, and questions not yet resolved


Based on the face-to-face part of Wednesday's meeting, it's probably
worthwhile to produce another iteration of the test-catalog design and
apply it for the prototype. Likewise, Ken's data for customizing the
now-generic test catalog to XSLT would change. We can make that happen
in a timely fashion with some email voting. This message will summarize
the findings and issues as they are known up to now. A later message will
show how the various name-parts are used by a test lab.

This message is not an actual call for votes. It opens discussion on some
issues. Based on past behavior of the committee members, email voting
has a better chance of working than calling an extra meeting before
August 21, but first we should see how lively the online discussion
becomes. Lurkers who work for test labs are encouraged to comment!

1. The most radical change from Iron Man and previous designs is that we
came around to advocating that we will not calculate filenames by a
formula for default names. Thus, the submitter would have to name every
input file used and output file generated.

2. After debating what could be done with the various pieces of a
fully-qualified filename, we favored NOT assembling file names from a
base-name and various extensions/tags/filetypes (e.g., .xsl). Thus, the
submitter would always provide the complete name of each file in the
catalog data.

DEFINITION: An input or output file is a "primary" file if its name is
specified as an argument when invoking the processor. Thus, an XSLT
processor has two primary inputs (one each of the stylesheet and data
type) and one primary output (type can vary). All other files, which
would be named within the input files, are "secondary" files and can
also have type designators. Secondary files can have relative paths, but
the primary must not, so that test labs can adjust the current directory
as needed by the processor under test. Imported or included stylesheets
are secondary stylesheet files, regardless of how deep they are in the
import tree. A file read by the document() function is a secondary data
file. In XSLT 2.0, the anticipated xsl:document instruction could create
secondary output files of various types. The console log in file form
might be a primary output of the "log" filetype--any reaction? We still
need a flag at the scenario level to designate that console output is
germane to the test. It's currently compare="message" but it could be
compare-console="yes" for better readability. The test lab must alter
the command line (or other form of invocation) appropriately to capture
the console output.

For the input-file element, we propose a "role" attribute that has one
of these values: primary-data, secondary-data, primary-stylesheet,
secondary-stylesheet, and secondary-params. Thus, when Germanium Man is
customized for XSLT, those five values are the list of valid roles for
the input. While we could genericize the primary/secondary aspect later,
the idea of roles is too new and should be shaken down in a prototype.
At least one input file must be designated as primary (see (3) below).
The output-file has three attributes: compare, role (primary/secondary),
and encoding. I think compare should remain separate because there are
so many places where you need it for program branching. A value of
compare="ignore" can be used to mark the situation where we think an
output file might be created but it's not germane (it's still a file that
needs to be deleted after the test run). I also mentioned in Germanium
Man that we might want compare="manual" for those files that have some
unpredictable bytes (e.g., generated IDs).

3. Interestingly, we observed no particular need for the submitter to
"name" the cases if (2) is enforced. For appearances, a case name can be
derived by taking the basename of the input-file designated as the
primary stylesheet input, or the primary data input for those cases that
have no stylesheet (embedded stylesheets within data). The likely uses of
the case-name are in log-file entries and certain directory schemes where
a directory named for the case-name contains the results of multiple runs.

4. Discussion revealed some conflicting uses for the output-file name(s).
Our short-term plan was to say that the name in the catalog is the name
for both the output as generated and for the "raw" version of the
"correct" output file. The InfoSetized and canonicalized versions of both
probably have matching basenames again, but canonicalized-file names are
actually the concern of the Test Lab.

5. Given (4), we need to put the InfoSetized "correct" output files, and
the raw "correct" output if we deliver that as planned, in a safe place.
Kirill proposed putting a subdirectory under each directory that contains
test cases, which means that we would potentially be grafting directories
onto the submitter's tree, and/or asking submitters to put raw correct
output files there when they submit them. The OASIS Committee must assign
a fixed name for this directory! An alternative is to have one or two
parallel trees, named by us, of correct output for all the cases in the
merged suite (two trees if we want to provide raw and InfoSetized in
separate trees). The test lab is responsible for moving or renaming the
primary output file after it's generated (or even assigning it a relative
path, if they dare). If (2) is enforced for output, the catalog should be
providing the full name with filetype extension, but a submitter may use
the same extension for all types, and it's still the lab's responsibility
to deal with that name.

6. The Title (or "suite-name") and Identifier ("file-path") are always
used together when generating directory names and fully qualified file
names (the latter also needs more data). The only reason they are
separate is that the OASIS Committee sets the Title while the submitters
name and organize directories in their sub-tree as they wish. A test lab
will need to get the Title for each test case, which is most convenient
if it's a datum within each test-case element in the merged catalog. We
could put it there in the merging process or require submitters to put it
there. In the former case, we could require that Title be present but
empty. In the latter case, we have to accept the submitter's Title or
assign one to them before they submit their catalog. As an attribute of
<test-suite>, this datum could be named "name" instead of "Title", while
"submitter" is probably the long-form name of the organization.

7. The file-path ("Identifier") has slashes within it to designate the
directory levels. It could also have a leading or trailing slash for
ease of concatenation. We assumed that the other name parts in the
concatenation (suite-name and filenames) will not have slashes. (All /
characters would translated to _ to create an HREF-clean name when
necessary.) Notice that we assume that the file-path must immediately
follow the suite-name; use of the suite-name in any other position could
create a naming conflict.

8. Submitted catalogs have an outer <test-catalog> element with many
<test-case> elements as children. Our merged catalog could have the same
structure or could place the <test-catalog> elements as children of an
outer <merged-catalog> element. The latter does allow Title to be an
attribute of <test-catalog> instead of being in every <test-case>, but it
means that the DTD differs between the submitted and merged catalogs.
Some choices for (6) also mandate different DTDs.

9. We propose that the test catalog must limit its character-set usage to
that part of ASCII allowed in XML documents.

DATA REQUIREMENTS: If we agree to the double use of the output-file names
as discussed in (4) above, then Iron/Germanium man has at least identified
all the name pieces we want from submitters. The new data items are the
designators of the filetypes and primary/secondary status.

We determined that a test lab might generate 4 types of scripts ("batch
files") from the catalog data:
A. Setup script to put inputs in the proper place
B. Run script to run test cases
C. Compare script to process and compare actual and reference output
D. Cleanup script to move or delete output and log files
We believe all necessary data is available among the "name parts" and
other data in the <scenario> element.

First draft of questions for voting:
1. Should we require submitters to make an explicit <input-file> or
<output-file> entry for every file involved in every test case?
2. Should we require the submitter to be explicit about all filetype
extensions of all files in the catalog?
3. Are we comfortable with a catalog that does not have a case name
supplied for each test case, given that there is a formula for
deriving one from required data?
4. Should the names of InfoSetized output files be derived by adding a
prefix, adding a suffix, or changing the extension (existing suffix)
of the raw output file?
5. Where should we deliver the correct outputs? Is the name of an
InfoSetized output derived from the name of a raw output by adding a
prefix, adding a suffix, or changing the existing extension (if any)?
(In thinking about this, keep in mind that having a known extension
like .xml is a convenience for launching apps in some environments.)
6. Should the submitter have the Title (suite-name) in their catalog as
submitted? Or have an empty placeholder for it?
7. Should the file-path element have a leading slash? Trailing slash?
8. How important is it for the DTD of the merged catalog to be exactly
the same as it is for the submitted catalogs? Should the <test-case>
elements be at the same depth in both?
9. Should the test catalog be submitted with only ASCII characters?
10. Should we replace the Dublin Core name Title with "test-suite"?
11. Should we replace the Dublin Core name Identifier with "file-path"?
12. If Title and Identifier are dropped: Do you want to retain the
non-controversial Dublin Core names Creator, Date, and Version?

Interlocks among questions:
(2) and (4) should be voted in a consistent fashion.
(6), (8), (10) should be voted in a consistent fashion.
(10) and (11) could be voted the same (keep/drop Dublin Core names) or
not. Only if you vote to drop Dublin Core on both will (12) be raised.

Other questions, not discussed in any depth at the meeting:
13. To what extent should we provide guidelines to submitters about the
current directory at the time the test is run? We can anticipate that
test labs will want guidelines from us.
14. If a submitter wants to update their test suite, which could mean
adding, dropping, or changing test cases, how do we want them to catalog
the revision?
15. For error cases, we could provide compare data of two types: an
<elaboration> string explains the error for human understanding, and/or
a <substring> element is something to grep for in the console output.
How much help should we provide to test labs in this area?
16. Do we want to track a Date of a whole submission in this catalog?
17. We are continuing to defer design of an input file that carries the
parameter-setting data. The mention of a secondary-params type of input
file is strictly meant as a placeholder in case we decide that (a) this
data should be conveyed in a file, and (b) the file can be treated the
same as other inputs for purposes of file management.
.................David Marston



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC