OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

ubl message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Candidate approach for W3C Schema with wild cards for code list value validation


This is to Tony Coates's attention, but I am anxious to hear from any 
others if there are flaws in my approach that I document in this post.

Based on today's call, I was unclear on the restrictions of using lax 
validation in W3C Schema, so I created a test here that I believe 
illustrates the use of W3C Schema as a "second pass" value validation, 
checking *only* the value of the currency attribute *and no structure bits 
whatsoever*.

This approach would be used as an alternative to using ISO 19757-3 
Schematron that others may find more palatable.

To recap, from Tony's summary posted as a password-protected file announced in:

   http://lists.oasis-open.org/archives/ubl/200508/msg00043.html

... we are contemplating a two-pass validation, the first pass being a 
structural validation with "class 2" code list information items 
unvalidated, and a second pass where *only* "class 2" code list information 
items have their values validated.  Many use XPath and ISO 19757-3 
Schematron for this second pass, reaching into the instance and checking 
only the values.  This *necessarily* must be a second pass because the 
first pass ensures the structural integrity of the node tree into which the 
XPath expressions reach ... without structural integrity there is no 
integrity in the XPath evaluations and one could get false positives (among 
other unexpected results) for tests (in the general case).

The question today was:  for those who cannot use ISO 19757-3 Schematron, 
how would one use W3C Schema technology for the second pass where only the 
code list information item values are being checked against an enumeration.

I've uploaded an unprotected .ZIP with an example, using currency (even 
though I know that currency is "class 1" and isn't one of the "class 2" 
ones we are going to extend) because of my limited time, but it shows the 
principles I had in mind:

http://www.oasis-open.org/committees/download.php/13998/codelist-xsd-gkholman-20050811-1940z.zip

I have two test UBL 1.0 instances, test.xml is a copy of the office invoice 
instance, testbad.xml is the same data with an invalid currency code.

I changed "codelist\UBL-CodeList-CurrencyCode-1.0.xsd" to be unrestricted 
normalized string (we talked in the room about using NMTOKEN; I'm not sure 
why 1.0 used normalized string and not NMTOKEN, so I left it as normalized 
string), with all of the required attributes.  Thus, with 
"maindoc\UBL-Invoice-1.0.xsd", the structure of cbc:TotalTaxAmount will be 
validated, but its actual value will be any normalized string, thus 
accomplishing the structural validation without value validation.

I then created "maindoc\CL-Invoice-1.0.xsd" which allows anything anywhere, 
but with lax validation, so that if any other declarations are present, 
those are validated.  That imports:

(1) - "codelist/CL-CodeList-CurrencyCode-1.0.xsd" that defines the type as 
allowing any attributes but with an enumeration of the allowed values

(2) - "common\CL-CommonBasicComponents-1.0.xsd", which only has a 
declaration for cbc:TotalTaxAmount, so it is the only item that will be 
validated

No other changes were made.  From what I can tell, 
"maindoc\CL-Invoice-1.0.xsd" is a schema that allows any structure anywhere 
in the instance, but constrains the value of cbc:TotalTaxAmount (found 
anywhere) to be from an enumerated list.

The "test.bat" file assumes there is a "w3cschema.bat" file on the path 
that validates instances using the model expressed in the first 
argument.  In my environment, I run Sun MSV for my W3C Schema validation (I 
haven't tested this with any other W3C Schema processor).  This appears to 
indicate that my approach to non-structural value-only validation with W3C 
Schema expressions works:

===8<---
T:\test2>test

T:\test2>call w3cschema xsd\maindoc\UBL-Invoice-1.0.xsd test.xml
No validation errors.

T:\test2>call w3cschema xsd\maindoc\UBL-Invoice-1.0.xsd testbad.xml
No validation errors.

T:\test2>call w3cschema xsd\maindoc\CL-Invoice-1.0.xsd test.xml
No validation errors.

T:\test2>call w3cschema xsd\maindoc\CL-Invoice-1.0.xsd testbad.xml
start parsing a grammar.
validating testbad.xml
Error at line:58, column:84 of file:///T:/test2/testbad.xml
   attribute "amountCurrencyID" has a bad value: the value is not a member 
of the
  enumeration.

the document is NOT valid.

T:\test2>
===8<---

It would appear that I do not need a wrapper element from another 
namespace, as was discussed during the call.  Have I messed up 
somewhere?  I suspect this is a design pattern for any kind of value-only 
validation we need to implement.

I suppose a complete solution would have 
"common\CL-CommonBasicComponents-1.0.xsd" declare only those elements that 
have a type from the class-2 code lists, and the "codelist\CL-*.xsd" files 
would have all the enumerations synthesized from the code list value 
instances and the "maindoc\CL-*.xsd" files would import the codelist files 
with the desired enumerations.

My reference to "my limited time" above is because I really have to focus 
on the HISC output specifications and I don't have much time to help out 
more on the code list stuff, but I did commit today to documenting my ideas 
for a W3C Schema-based second-pass methodology as I hope I have done in 
this note.  I can work later on the XSLT to synthesize the enumerations if 
you wish, Tony, or leave it with you and commit to help you with any 
questions you may have.

Please let me know if anyone has any questions.

. . . . . . . Ken

--
World-wide on-site corporate, govt. & user group XML/XSL training.
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/o/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Breast Cancer Awareness  http://www.CraneSoftwrights.com/o/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]