ubl message

Subject: Re: [ubl] Schematron demo
From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: ubl@lists.oasis-open.org
Date: Tue, 13 Sep 2005 22:20:00 -0400
At 2005-09-07 08:25 -0400, Burnsmarty@aol.com wrote:
>Hey Ken, comments on your comments to my comments inline.
>
>Hope this helps,

I appreciate the dialogue, Marty, thank you.

Unfortunately, I think some of the quoting over many messages has 
been munged because you are attributing some comments to me that 
aren't from me.

>In a message dated 8/30/2005 8:27:23 A.M. Eastern Daylight Time, 
>gkholman@CraneSoftwrights.com writes:
>...
>At 2005-08-30 07:53 -0400, Burnsmarty@aol.com wrote:
>...
> >UBL schemas don't have to accommodate code list extensibility.
>
>Yes, I believe this is a requirement:

What you've quoted isn't something I've typed, it is something found 
in your message as coming from you (as I interpret it):

   http://lists.oasis-open.org/archives/ubl/200508/msg00177.html

I believe it is a requirement for "type 2 schemas" (not defined by 
UBL or ATG), but I do not believe "type 1 schemas" (those hardwired 
by UBL or ATG) need to be expressed by a mechanism that allows extensibility.

A two-stage validation using Schematron does address extensibility 
for type 2 schemas, and indeed offers trading partners to validate 
the use of a subset of any type 1 code list should they wish.

> >More Problematic:
> >The schematron tests for attribute name and therefore bypasses the
> >use of the context of the schema -- therefore it can ignore the
> >schema's constraints. For example, what if the schema designer wants
> >to constrain the values in a particular usage but not others.
>
>XPath addresses this.  Note in Jon's demonstration how the
>"currency-value" rule is declared to be abstract and is only made
>concrete by the use of a non-abstract rule with a context=
>attribute.  The demo uses a wildcard context="*[@amountCurrencyID]"
>but could easily just as well use context="cbc:TotalTaxAmount"
>instead for a more focused context.
>
>Your approach will allow the namespace identifier to be searched

I'm not sure what you mean by "namespace identifier to be searched".

>but does not enforce any constraints imposed in the schema itself.

Correct, it is the role of the schema to enforce constraints on the 
position in the structure.  The schema, however, would only have 
base="xsd:token" for the value of the code list type 2 item which is 
then validated in the second pass using Schematron.

To your question "what if the schema designer wants to constrain the 
values in a particular usage but not others", then the Schematron 
expression can have two different assertions, one for one usage and 
one for all the others, which can be checked with different 
values.  I don't believe that can be done with W3C Schema as the code 
list value will have only one declaration for all uses, not 
supporting contextual differences as can be done with Schematron.

>I suspect you can't be faithful to the schema with this method 
>without reconstructing the schema itself with schematron rules.

There is no need to duplicate what is already validated 
structurally.  Provided there is awareness (in XML) of all of the 
elements and attributes expressed by type 2 code lists then that XML 
can be processed to create the contexts for the abstract rules.

Tony, you demonstrate the creation of the abstract Schematron rules 
from the genericode instance, but I don't see where you specify the 
contexts in which codes are being used.  How would we express *where* 
in UBL the currency code lists are being used for elements and 
attributes so that the concrete rules pointing to abstract rules will 
have the correct XPath match patterns? (ed: I answer this myself 
below, but I'll leave the question since others might be asking 
themselves this question)

>Otherwise there will need to be specific NDR rules on the use of 
>code lists that prevent ambiguity when schematron is used as the 
>validation method.

Which ambiguity?  Trading partners could agree that currency codes 
allowed for one part of an instance must be different than those 
allowed in another part of the instance, and each of these parts 
would be contexts for the two different abstract rules.


> >Approach is namespace insensitive.
>
>Only the demonstration is namespace insensitive, one could easily
>just as well use context="cbc:*[@amountCurrencyID]".
>
>Same comment as previous.

Do you mean the comment about reconstructing the schema?  If so, then 
I don't understand.

> >How does one detect that an instance file is based on an extended code list?
>
>Business rules and the choice to apply a given set of assertions.
>
>So you are saying there is no way to determine by inspecting the 
>instance file, what composition rules it is expected to follow?

What is a "composition rule" in this context?  If the business rules 
state that one subset of a code list applies in one place and a 
different subset of a code list applies in a different place, then 
trading partners express two code lists and point each context to the 
abstract rules accordingly.

Again, I'm assuming the allowed values of type 1 code lists are fixed 
in the schema, though subsettable by trading partners in part 2, and 
allowed values of type 2 code lists are permissive tokens in the 
schema requiring trading partners to specify the agreed-upon code 
list values in a code list instance that generates the Schematron 
assertions (as in Tony's example).

Though I need an answer from Tony regarding specifying the different contexts.

> >How can one tell in the instance file what code lists are being used?
>
>One could base the context on the presence of the code list support 
>attributes:
>
>     context="*[@amountCurrencyID][@amountCurrencyCodeListVersionID='0.3']"
>
> >How does one detect that the correct version of the code list is being used?
>
>Specific values can be checked with the above, or one could report an
>error that a given version isn't being used:
>
>     <assert test="amountCurrencyCodeListVersionID='0.3'">
>
>So what I think you are saying is that the schematron becomes an 
>integral part of the schema itself and has to be part of the UBL 
>packaging and versioning. This schematron would refine what the 
>schema says is required.

An integral part of validation, not an integral part of the schema 
[expression].  Yes, the Schematron expressions may choose to limit 
the use of values in type 1 code lists (since the schema allows all 
values the committee defines and partners may not want all values 
available) and should express the limited use of values in type 2 
code lists (since the schema doesn't specify any values for type 2 code lists).

>How do we prevent the schematron from conflicting with the schema 
>since both check content based on overlapping rules? I think this 
>collision doesn't occur when schema is used for individual value 
>validation and schematron is used for validating values based on other values.

The collision also does not occur when the schema allows a value set 
for type 1 code lists and a permissive wildcard token for type 2 code 
lists and Schematron is used for validating values based on a given 
context or any context.

> >Where is the code list itself?
>
>It is an instance of the upcoming code list schema and is input to an
>automated process to produce the .sch set of assertions.
>
>Does this mean that a third party needs to construct an XML file for 
>use by UBL and schemas for all other ebusiness standards? I was 
>trying to devise a mechanism by which third parties could construct 
>a single reference and it could be used by all.

There would be a single reference ... perhaps it is Tony's genericode 
expression of code list values.

In fact, you just answered one of my questions for Tony.  The 
genericode expression *can't* have UBL contexts because it is just an 
expression of code values for all document models.

I will look into an analysis of the UBL schemas to see how that, 
combined with the genericode expressions, would produce a Schematron 
expression of the abstract genericode lists in the contexts described 
by the schemas.  I have in mind how it would be done, so I'll give it a try.

> >What does an extension document look like?
>
>What is an "extension document"?  In Schematron one merely enumerates
>the desired conditions that must be true or the desired conditions
>that must not be false.
>
>An extension document is a description of changes that a user makes 
>to the standard schemas that allows for the testing of conformant 
>documents without having to alter the underlying standard schemas. 
>The extension mechanism would allow information represented in the 
>schemas to be "extended".

Then the extension document is described along the lines of Tony's 
genericode example.  If *all* standards committees were geared to 
work with genericode instances of code lists, then these become the 
focal points of trading partner agreements for code list validation 
across all applications.

It would be wonderful if trading partners could write their own 
genericode instances of code lists and then run three or four 
different applications all using their agreed-upon values without 
having to write custom ones for each application.

> >What does a restriction document look like?
>
>What is a "restriction document"?
>
>Are you speaking of expressions that extend or restrict an instance
>of the upcoming code list schema?  I would leave such a question to
>Tony in regard to his proposed code list instance expression.
>
>Restrictions are the corrolary to extensions that might permit, for 
>example, trading partners to agree that, for example, 
>PricingCurrencyCode must have a value of EUR.

Okay, then same answer: genericode instances agreed upon by trading partners.

An extension context for genericode use would be for type 2 code 
lists that are permissively defined by the schema and don't have the 
values needed by trading partners.

A restriction context for genericode use would be for type 1 code 
lists that are hardwired by the schemas and have too many values than 
needed by trading partners.

> >How is versioning handled?
>
>Versioning of what?
>
>Versioning of the schematron documents which are required to 
>validate the instance documents.

The Schematron expressions would be synthesized from the UBL schemas 
and the genericode instances, so trading partners would agree on the 
versions of each of those and the synthesis of the validation 
wouldn't, itself, need to be versioned.  It could evaporate after 
use, or it could be cached until any of the inputs are changed.  I 
don't see that the Schematron expression would need to be versioned.

I think you and others are still under the impression that the 
Schematron expressions are authored and persistent.  They are 
synthesized and possibly cached, but I do not believe they are 
persistent with their own identity.

> >Still need to develop:
> >xslt that translates code list in XML to schematron form.
>
>When we decide what the code list instance looks like, and then what
>a proforma Schematron set of assertions looks like, the XSLT should
>be easily created.  It would be a waste to start writing stylesheets
>before knowing the inputs and the outputs.  I don't have any concerns
>that if we know the inputs and the outputs that an XSLT can be
>created to use for illustration.
>
>The question is, how complex will a schematron document be that is 
>faithful to the schema and completely validates all code lists in UBL usage.

Let me see if I can derive that from the schemas and the genericode 
instances and illustrate it in an automated fashion.

I believe the complexity is totally irrelevant since it isn't a 
persistent and authored artifact ... it is synthesized and cached and 
need never be manipulated.

> >Examples of schematron to handle type 1 (code list as element)
>
>My recollection of "type 1" is that being an element is not an aspect
>of its distinction.
>
>However, because everything is XPath, one can just assert
>test=".='CAD'" instead of test="@amountCurrencyID='CAD'" ... and as I
>mentioned earlier the implementation of Schematron I'm using appears
>to be old and deficient and I'm trying to find an implementation of
>ISO Schematron where the tests can all be based on the current node
>and the contexts can be set to either an element or an attribute.
>
>I will point out here that one distinction of type 1 usage is that 
>the name of the element is different within the context of the 
>schema. This is slightly more complex than the type 2 case which 
>uses an attribute that always has the same name. There is no 
>requirement in UBL, I think, that says that element names must be 
>globally unique within a namespace.

But I think it is a W3C Schema constraint that sibling elements of 
the same name cannot be structured differently in the schema, and 
globally defined elements of a given name can only be declared once.

So, yes, I think by use of W3C Schema that the structures for element 
names must be globally unique within a namespace.

For trading partners that one element X to have one set of values in 
context Y and for a different element X to have a different set of 
values in context Z that that can only be validated using Schematron.

Though of course none of the above is true for RELAX-NG as it can 
have siblings that are different and the declaration mechanisms are 
more powerful ... but I digress ....

> >Construction of schematron to handle all validation of ubl instance
> >documents.
>
>Schematron is only tasked with value validation and not structural
>validation, though if necessary, Schematron can be used to constrain
>structural validation that may be loosely defined in other schema expressions.
>
> >How complex will this be? Is there any processing resource demands
> >imposed by this method.
>
>I'm not sure what you are asking.
>
>We are looking so far at simplified examples. To judge overall 
>complexity we need a fully done example that validates instance 
>files of the standard UBL examples such as order.

I now agree ... but I could use more fodder to help with my example building.

Where have we catalogued all of the code lists as being either type 1 
or type 2, and who can help create genericode instances of as many of 
the code lists as possible.  Can this catalogue be expressed in XML 
so that I can use it as input to a transformation?

Then I can use those inputs with my ideas for synthesizing the 
Schematron.  I believe I can use the HISC XPath files, which 
themselves are synthesized from the schemas, as the input to tell me 
all of the contexts in which code lists are used.

To create a full example for Order I will need this help ... and any 
help thus provided will be useful in the final deliverable anyway, so 
it won't be a waste just for a demo, it will be useful as draft 
versions for a final deliverable.

Thanks again, Marty!  And thanks to anyone who can help me with these 
code list artifacts (genericode instances and code list type 
catalogue) I need while I write the XSLT for the Schematron.

. . . . . . Ken

--
World-wide on-site corporate, govt. & user group XML/XSL training.
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/o/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Cancer Awareness Aug'05  http://www.CraneSoftwrights.com/o/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
References:
- Re: [ubl] Schematron demo
  - From: Burnsmarty@aol.com