Table of Contents
This straw-man proposal offers a methodology and functioning example for validating code list members while providing the validation perspectives described in the minutes of the 2003-08-28 meeting of the code list task group:
This also offers a candidate direction for supplying adjunct information associated with members of code lists.
This document is not meant to be the candidate wording for any NDR rules or recommendations or formal UBL documentation to be supplied to end users; this is only meant for internal committee work.
When the code list task group refines the content of this document into a viable and accepted methodology, the components of this will have to be migrated into appropriate UBL documents elsewhere.
The scope of this experimentation is in two areas:
the validation of authored values used in code-list-based data type content of a UBL instance
the supply of supplemental information of interest to downstream processing applications for purposes such as the presentation and acceptance of the information to and from a human with appropriate translations
The code list task group identified and detailed four validation perspectives for the values found in instance content of the type of a given code list, summarized as:
"standard" constraints supplied by UBL and expected to be used by users
"placebo" constraints delivered "out-of-the-box" by UBL but only providing rudimentary validation of the structure of the contents and not the specific value of the contents
"stock" constraints delivered by UBL and available to the user to override "placebo" constraints
"private" constraints created by the user and used to override either "standard" or "placebo" constraints
The code list task group is creating a catalogue of all code lists in all UBL document models, including among a host of other pieces of information the following fields:
a classification of each code list as requiring either a "standard" or a "placebo" set of constraints to be developed by LCSC and supplied in the final package as the "out-of-the-box" delivered collection of valid values
a classification of each "placebo" code list as needing a "stock" set of constraints to be developed by LCSC and supplied in the final package as an alternative collection of valid values
an abbreviated name to use for the data type of the code list suitable for use in file naming and URI conventions
a normative namespace URI to use for the declared data type of the code list
an informative namespace prefix for the data type of the code list, used to ensure consistent documentation
namespace prefixes are never normative
A highly modular approach is proposed for the mechanical implementation of the data typing of the code list:
the delivered set of files works "out-of-the-box"
configuration is accomplished by copying desired files over top of delivered files
none of the delivered files need to be edited by the user
copies of the original delivered set exist so that users can "restore" their configuration to the "out-of-the-box" configuration
Note that this approach will greatly increase the number of files in the deliverable, but it won't change the complexity of the validation task from the user's perspective.
Related to these files, an outstanding question regarding "importing" or "including" these code list model fragments is highlighted below.
A suite of test files has been created to illustrate the mechanics of validation for a single code list data type. Should this approach be approved, these steps will need to be repeated for every code list.
Consider the example of status codes: probably not a good choice for "placebo" but the smallest sample instance uses it and small is good for examples (go with me on this for this example). A document is of a particular status. As delivered in this example, UBL users can use any status they wish as long as it is a token value because the delivered "in-use" validation is just a copy of the "placebo" validation thus it is checking that the value is a W3C Schema token value. UBL delivers a "stock" set of the two values "Original" and "Copy" as valid status code members. Most users would probably be sufficed by this, so most would copy the "stock" set over top of the "in-use" set. One user, however, needs a "private" set of three values "Original", "Copy" and "Fraudulent", thus necessitating they create a private set of constraints and copy them over top of the "in-use" file.
The following files (listed alphabetically) are in this illustrative suite:
code-strawman1-20030903.html
this file
code-strawman1-20030903.xml
the DocBook source to this file
CoreComponentParameters-codelist-strawman-1.xsd
unchanged from version 0.81D7
CoreComponentTypes-codelist-strawman-1.xsd
adds a new data type for the <cat:StatusCode> element
derived from an externally-defined data type in a namespace used solely for that data type
test-good-private.xml
a test instance file utilizing a status code value "Fraudulent" only described in the private set of constraints
test-good-stock.xml
a test instance file utilizing only a status code value "Original" described in the stock set of constraints
test.bat
reconfigures the "in-use" XSD file of code list constraints by copying the desired set of constraints
tests the resulting "in-use" XSD against both of the test instance files
UBL-Codelist-Catalog-Private.xml
an untested catalogue-based approach to the configuration mechanism to employ the private set of constraints
UBL-Codelist-Catalog-Stock.xml
an untested catalogue-based approach to the configuration mechanism to employ the stock set of constraints
UBL-CodeList-StatusCode-Placebo-strawman-1.xsd
a lax set of constraints checking only that the coded value is a token
delivered by UBL and the delivered contents of the "in-use" XSD file
UBL-CodeList-StatusCode-Private-strawman-1.xsd
a constrained set of values for the status code including "Fraudulent"
as if it had been created by a user of UBL
UBL-CodeList-StatusCode-Stock-strawman-1.xsd
a constrained set of values expected by the UBL committee to be used by the UBL user for functioning interoperability
UBL-CodeList-StatusCode-Use-strawman-1.xsd
the set of constraints used by reference by being imported by the Core Component Types schema fragment
UBL-OrderCancellation-codelist-strawman-1.xsd
the order cancellation document model unmodified
UBL-Reusable-codelist-strawman-1.xsd
the declaration of the status code element as being the code list type
Corresponding to the approach detailed above, note the following:
the reusable fragment declares <StatusCode> to be of type cct:StatusCodeType (line 7490)
the core component types declares cct:StatusCodeType (line 96) as being derived from status:StatusCodeType (line 99)
note that I'm using derivation here from another namespace because I note the schema fragments never use <xsd:include> and only use <xsd:import>
this design would be simpler if each of the status code type definitions for cct:StatusCodeType were in separate schema fragments that were simply included in the core component types
following the demonstrated practice of only using <xsd:import> for externally defined schema fragments is forcing me to go this extra step of defining a namespace URI for the code type
note also that I believe a paper from Gunther is counseling the use of distinct namespace URI strings for each code list data type, so my use of import in this case would be apropos for this
the core component types imports the definition of status:StatusCodeType from the "in-use" file of constraints using the <xsd:import> (line 57)
the "in-use" file of constraints is a copy of the "placebo" file of constraints
the "placebo" file of constraints only needs the field to be a token (line 15)
the "stock" file of constraints declares a set of values UBL believes to be sufficient (lines 16-17)
the "private" file of constraints declares that set needed by a particular user and with values not anticipated by UBL (lines 16-18)
Running the test.bat file produces the following results:
T:\ubl\codelist>rem The delivered code list is the placebo that validates against all token values
T:\ubl\codelist>copy UBL-CodeList-StatusCode-Placebo-strawman-1.xsd UBL-CodeList-StatusCode-Use-strawman-1.xsd
1 file(s) copied.
T:\ubl\codelist>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-strawman-1.xsd test-good-stock.xml
start parsing a grammar.
validating test-good-stock.xml
the document is valid.
T:\ubl\codelist>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-strawman-1.xsd test-good-private.xml
start parsing a grammar.
validating test-good-private.xml
the document is valid.
T:\ubl\codelist>copy UBL-CodeList-StatusCode-Stock-strawman-1.xsd UBL-CodeList-StatusCode-Use-strawman-1.xsd
1 file(s) copied.
T:\ubl\codelist>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-strawman-1.xsd test-good-stock.xml
start parsing a grammar.
validating test-good-stock.xml
the document is valid.
T:\ubl\codelist>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-strawman-1.xsd test-good-private.xml
start parsing a grammar.
validating test-good-private.xml
Error at line:13, column:46 of file:///T:/ubl/codelist/test-good-private.xml
the value is not a member of the enumeration: ("Original"/"Copy")
the document is NOT valid.
T:\ubl\codelist>copy UBL-CodeList-StatusCode-Private-strawman-1.xsd UBL-CodeList-StatusCode-Use-strawman-1.xsd
1 file(s) copied.
T:\ubl\codelist>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-strawman-1.xsd test-good-stock.xml
start parsing a grammar.
validating test-good-stock.xml
the document is valid.
T:\ubl\codelist>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-strawman-1.xsd test-good-private.xml
start parsing a grammar.
validating test-good-private.xml
the document is valid.
T:\ubl\codelist>rem Done!
Note above how the private instance does not validate against the "stock" set of constraints.
Copying one file over top of another file seems cumbersome.
With XML Catalog http://www.oasis-open.org/committees/entity/spec-2001-08-06.html support we should be able to have the delivered files be the placebo files and the user merely employs an XML Catalog to selectively override the public identifier of the placebo file with a reference to their choice of XSD file.
A user can choose to create an XML Catalog that specifies their own XSD fragment for each desired code list type that needs to be redefined in their environment. Invoking validation would merely point to their XML Catalog without changing any of the delivered files.
This has not been tested pending finding an XSD validating processor that recognizes the use of XML Catalog files in priority over any URI associated with the URN of a schema location.
Some downstream processes may require supplemental information associated with each of the members described in the code list data types. For example, the display string "Etats Unis" for the French-language display of the country code "US".
This document suggests a mechanism of naming in a default attribute the file name of supplemental information, where the information in the file is indexed by some means to the individual code list member values. The maintainer of the code list has free rein to describe the supplemental information in any vocabulary desired, though it is incumbent on them to sufficiently describe the vocabulary so that people writing stylesheets or other processes know how to de-reference the strings or other values they might need.
The defaulted attribute is described in the data type definition fragment. None of the "placebo" data type fragments would point to any supplemental information. The "stock" data type fragments may or may not be delivered with file names specified for a collection of supplemental files supplied by UBL for these data types.
Using the same mechanisms described above for overriding the data types, a user with a new or custom supplemental file would create a "private" data type schema expression that includes the new attribute value, and would configure their environment to utilize it for downstream processes to find.
The obstacle to testing this methodology is that no XSLT 2 processors yet build source node trees on the PSVI, only on the information set. The information set does not include defaulted attributes described by W3C Schema expressions.
A concern about this approach is how will downstream processors interpret relative URI specifications for the supplemental file location, when relative specifications will need to be used because we don't know the directory into which the user installs the UBL file set?