[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Notes about name management, and questions not yet resolved
Based on the face-to-face part of Wednesday's meeting, it's probably worthwhile to produce another iteration of the test-catalog design and apply it for the prototype. Likewise, Ken's data for customizing the now-generic test catalog to XSLT would change. We can make that happen in a timely fashion with some email voting. This message will summarize the findings and issues as they are known up to now. A later message will show how the various name-parts are used by a test lab. This message is not an actual call for votes. It opens discussion on some issues. Based on past behavior of the committee members, email voting has a better chance of working than calling an extra meeting before August 21, but first we should see how lively the online discussion becomes. Lurkers who work for test labs are encouraged to comment! 1. The most radical change from Iron Man and previous designs is that we came around to advocating that we will not calculate filenames by a formula for default names. Thus, the submitter would have to name every input file used and output file generated. 2. After debating what could be done with the various pieces of a fully-qualified filename, we favored NOT assembling file names from a base-name and various extensions/tags/filetypes (e.g., .xsl). Thus, the submitter would always provide the complete name of each file in the catalog data. DEFINITION: An input or output file is a "primary" file if its name is specified as an argument when invoking the processor. Thus, an XSLT processor has two primary inputs (one each of the stylesheet and data type) and one primary output (type can vary). All other files, which would be named within the input files, are "secondary" files and can also have type designators. Secondary files can have relative paths, but the primary must not, so that test labs can adjust the current directory as needed by the processor under test. Imported or included stylesheets are secondary stylesheet files, regardless of how deep they are in the import tree. A file read by the document() function is a secondary data file. In XSLT 2.0, the anticipated xsl:document instruction could create secondary output files of various types. The console log in file form might be a primary output of the "log" filetype--any reaction? We still need a flag at the scenario level to designate that console output is germane to the test. It's currently compare="message" but it could be compare-console="yes" for better readability. The test lab must alter the command line (or other form of invocation) appropriately to capture the console output. For the input-file element, we propose a "role" attribute that has one of these values: primary-data, secondary-data, primary-stylesheet, secondary-stylesheet, and secondary-params. Thus, when Germanium Man is customized for XSLT, those five values are the list of valid roles for the input. While we could genericize the primary/secondary aspect later, the idea of roles is too new and should be shaken down in a prototype. At least one input file must be designated as primary (see (3) below). The output-file has three attributes: compare, role (primary/secondary), and encoding. I think compare should remain separate because there are so many places where you need it for program branching. A value of compare="ignore" can be used to mark the situation where we think an output file might be created but it's not germane (it's still a file that needs to be deleted after the test run). I also mentioned in Germanium Man that we might want compare="manual" for those files that have some unpredictable bytes (e.g., generated IDs). 3. Interestingly, we observed no particular need for the submitter to "name" the cases if (2) is enforced. For appearances, a case name can be derived by taking the basename of the input-file designated as the primary stylesheet input, or the primary data input for those cases that have no stylesheet (embedded stylesheets within data). The likely uses of the case-name are in log-file entries and certain directory schemes where a directory named for the case-name contains the results of multiple runs. 4. Discussion revealed some conflicting uses for the output-file name(s). Our short-term plan was to say that the name in the catalog is the name for both the output as generated and for the "raw" version of the "correct" output file. The InfoSetized and canonicalized versions of both probably have matching basenames again, but canonicalized-file names are actually the concern of the Test Lab. 5. Given (4), we need to put the InfoSetized "correct" output files, and the raw "correct" output if we deliver that as planned, in a safe place. Kirill proposed putting a subdirectory under each directory that contains test cases, which means that we would potentially be grafting directories onto the submitter's tree, and/or asking submitters to put raw correct output files there when they submit them. The OASIS Committee must assign a fixed name for this directory! An alternative is to have one or two parallel trees, named by us, of correct output for all the cases in the merged suite (two trees if we want to provide raw and InfoSetized in separate trees). The test lab is responsible for moving or renaming the primary output file after it's generated (or even assigning it a relative path, if they dare). If (2) is enforced for output, the catalog should be providing the full name with filetype extension, but a submitter may use the same extension for all types, and it's still the lab's responsibility to deal with that name. 6. The Title (or "suite-name") and Identifier ("file-path") are always used together when generating directory names and fully qualified file names (the latter also needs more data). The only reason they are separate is that the OASIS Committee sets the Title while the submitters name and organize directories in their sub-tree as they wish. A test lab will need to get the Title for each test case, which is most convenient if it's a datum within each test-case element in the merged catalog. We could put it there in the merging process or require submitters to put it there. In the former case, we could require that Title be present but empty. In the latter case, we have to accept the submitter's Title or assign one to them before they submit their catalog. As an attribute of <test-suite>, this datum could be named "name" instead of "Title", while "submitter" is probably the long-form name of the organization. 7. The file-path ("Identifier") has slashes within it to designate the directory levels. It could also have a leading or trailing slash for ease of concatenation. We assumed that the other name parts in the concatenation (suite-name and filenames) will not have slashes. (All / characters would translated to _ to create an HREF-clean name when necessary.) Notice that we assume that the file-path must immediately follow the suite-name; use of the suite-name in any other position could create a naming conflict. 8. Submitted catalogs have an outer <test-catalog> element with many <test-case> elements as children. Our merged catalog could have the same structure or could place the <test-catalog> elements as children of an outer <merged-catalog> element. The latter does allow Title to be an attribute of <test-catalog> instead of being in every <test-case>, but it means that the DTD differs between the submitted and merged catalogs. Some choices for (6) also mandate different DTDs. 9. We propose that the test catalog must limit its character-set usage to that part of ASCII allowed in XML documents. DATA REQUIREMENTS: If we agree to the double use of the output-file names as discussed in (4) above, then Iron/Germanium man has at least identified all the name pieces we want from submitters. The new data items are the designators of the filetypes and primary/secondary status. We determined that a test lab might generate 4 types of scripts ("batch files") from the catalog data: A. Setup script to put inputs in the proper place B. Run script to run test cases C. Compare script to process and compare actual and reference output D. Cleanup script to move or delete output and log files We believe all necessary data is available among the "name parts" and other data in the <scenario> element. First draft of questions for voting: 1. Should we require submitters to make an explicit <input-file> or <output-file> entry for every file involved in every test case? 2. Should we require the submitter to be explicit about all filetype extensions of all files in the catalog? 3. Are we comfortable with a catalog that does not have a case name supplied for each test case, given that there is a formula for deriving one from required data? 4. Should the names of InfoSetized output files be derived by adding a prefix, adding a suffix, or changing the extension (existing suffix) of the raw output file? 5. Where should we deliver the correct outputs? Is the name of an InfoSetized output derived from the name of a raw output by adding a prefix, adding a suffix, or changing the existing extension (if any)? (In thinking about this, keep in mind that having a known extension like .xml is a convenience for launching apps in some environments.) 6. Should the submitter have the Title (suite-name) in their catalog as submitted? Or have an empty placeholder for it? 7. Should the file-path element have a leading slash? Trailing slash? 8. How important is it for the DTD of the merged catalog to be exactly the same as it is for the submitted catalogs? Should the <test-case> elements be at the same depth in both? 9. Should the test catalog be submitted with only ASCII characters? 10. Should we replace the Dublin Core name Title with "test-suite"? 11. Should we replace the Dublin Core name Identifier with "file-path"? 12. If Title and Identifier are dropped: Do you want to retain the non-controversial Dublin Core names Creator, Date, and Version? Interlocks among questions: (2) and (4) should be voted in a consistent fashion. (6), (8), (10) should be voted in a consistent fashion. (10) and (11) could be voted the same (keep/drop Dublin Core names) or not. Only if you vote to drop Dublin Core on both will (12) be raised. Other questions, not discussed in any depth at the meeting: 13. To what extent should we provide guidelines to submitters about the current directory at the time the test is run? We can anticipate that test labs will want guidelines from us. 14. If a submitter wants to update their test suite, which could mean adding, dropping, or changing test cases, how do we want them to catalog the revision? 15. For error cases, we could provide compare data of two types: an <elaboration> string explains the error for human understanding, and/or a <substring> element is something to grep for in the console output. How much help should we provide to test labs in this area? 16. Do we want to track a Date of a whole submission in this catalog? 17. We are continuing to defer design of an input file that carries the parameter-setting data. The mention of a secondary-params type of input file is strictly meant as a placeholder in case we decide that (a) this data should be conveyed in a file, and (b) the file can be treated the same as other inputs for purposes of file management. .................David Marston
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC