ubl message

Subject: Re: [ubl] Code list value validation methodology (version 0.3)
From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: Universal Business Language <ubl@lists.oasis-open.org>
Date: Wed, 25 Jan 2006 00:08:00 -0500
Thank you, Tim, for your supportive comments on my proposal.  I 
raised this scenario in a plenary session at the Ottawa face-to-face 
in August 2005, but due to my personal challenges I was unable to 
write it up until December, so I appreciate that you and the others 
have had the patience to wait for me to finally put those ideas down 
for consideration.

At 2006-01-25 11:39 +0800, Tim McGrath wrote:
>Ken, I have a few comments but they are not to detract from what is 
>written, just some thoughts that may help relate these ideas to 
>previous code list validation approaches.

I welcome any and all comments and discussion to improve on the 
content of this.  We haven't heard any comments yet from the UBL-Dev community.

>In section 3.2 you perfectly describe how partners may wish to 
>constrain sets of values (such as USD and CAD), but we should also 
>recognize that extending the sets is a requirement too.  Sometimes 
>they just have to add a value that isn't in the set, as when new 
>currency codes are issued that have not made it to the 'standard 
>set'.  Basically it is safer to assume that no codelist is stable 
>and no set of values is fixed over time.  I believe your approach 
>would work with this requirement as well.

Partially, but there is a dependency issue that might impose some 
changes on how we declare code lists in the schemas.  Consider that 
my proposed methodology requires an instance to first pass the 
official UBL W3C Schema constraints in order to confirm that all of 
the information items are correctly in place structurally in an 
instance before it can be run against a code list context association 
file to check the values in those places.

Since the UBL enumerations would not include the extended values, an 
instance wouldn't pass the first step before the value validation is 
executed.  If we can't run the first pass, the integrity of the 
second pass is in question, so I think the run of the first pass is 
mandatory.  Hence the problem.

However, if we changed the way UBL declares code lists to be defined 
solely on a lexical basis without any enumeration of any values (very 
radical suggestion and probably not palatable to many people), then 
not only would the code list context association file work for all 
code lists for all sets (public, private, restricted, extended, 
alternative, etc.), but it would end up probably being a mandatory 
step in the validation, not an optional step.  I have no problems 
with this from a geek's perspective of ensuring the values are 
correctly defined for trading partners at the end of all validation 
processes (a basic tenet of Document Schema Definition Languages 
(DSDL), but it might be considered heretical by some that value 
enumerations not have any role in the schema expressions.  The schema 
expressions end up being solely structural validation, and the code 
list context association files end up satisfying all requirements for 
value validation.  I personally don't think that is a problem, but 
many people might have strong opinions that schema-expressed 
enumerations are sacrosanct and necessary.

If the UBL schemas changed in this fashion, the specification would 
then change to argue the point that for extensible controlled 
vocabularies, schema expressions must be solely used for structural 
validation and that controlled vocabulary value validation must be 
done as a separate process (proposed in the document to be the code 
list context association expressions).  The UBL schemas would be 
changed to be solely token values (or normalized strings, or 
whatever) without any enumerations for any of the code lists, and the 
UBL packaging would then include normative genericode files and a 
normative code list context association file that would mimic what we 
are now doing with schema-expressed enumerations for all of the code 
lists ... this would also provide trading partners the raw material 
for the genericode and code list context association files they 
create and exchange between themselves (with the sets of values being 
a subset, full set, superset or alternate set of the values we 
package with UBL).

Other projects with enumerations in their schemas could still point 
to this methodology as a way of doing value validation, but without 
extension or alternative values ... only with restriction.  To get 
the extension, the base schema expressions must not have any 
enumerations, or the first pass (which is mandatory) is put in 
jeopardy.  One aspect of my proposed methodology that might support 
UBL and other projects totally abandoning schema-expressed 
enumerations of values is that the methodology supports different 
sets of values from the same code list to be specified in different 
document contexts.  Schema-expressed enumerations have only 
document-wide context and that may not be appropriate for some 
trading partner agreements.

But I think I might be able to hear the hue and cry already from others.

Oh, if someone should argue that W3C Schema is a standard and 
Schematron is not a standard, Schematron has been standardized as 
ISO/IEC 19757-3 (part of the DSDL family of standards).

>In section 5.3 (para. 2) you say "order invoice'.  Did you mean the 
>Order-to-Invoice procurement scenario or is it a typo?

A typo; it should read:  "all country sub-entity coded values used in 
the order and invoice shall be valid states according to the United 
States postal service."  I was trying to express that the same 
constraints might apply to more than one document type ... which 
relates at the very end of 6.1 to the example where I illustrate how 
one could express the same constraints or different constraints for 
two or more instances of different document types in a single code 
list context association file.  I didn't want to leave the impression 
that for 20 document types we needed 20 context association files ... 
we can have one with simple XPath addresses for constraints across 
all document types and fully-qualified XPath addresses for 
constraints restricted to a given document type.

>The last paragraph of section 5.3 decribes the XML  equivalent of 
>what EDI calls an Implementation Guide, Message Implementation Guide 
>(or MIG).  It might make sense to some of us if you say this.

I am unfamiliar with that, so if someone could either point me to 
where I can learn about it, or suggest prose to add to the content, 
that would help.

>Section 6.1 reads to me like some of the material in the OASIS 
>Context Assembly Mechansim (CAM).  Is anyone able to say how CAM 
>relates to this approach?

I am unfamiliar with any CAM details as I have not had the luxury of 
time to even crack open the cover and look inside.  If CAM already 
accommodates document context and external references and their 
association, perhaps it is unnecessary for this proposal to include 
yet another document type for the code list context association files.

>In Section 7.1 we should note that the names of UBL codelist 
>metadata (attributes) have changed.  I think your examples are based 
>on UBL 2.0 attribute names and anyone looking at UBL 1.0 may wonder.

I understood them to be based on UBL 1.0, as I reference the file 
UBL-CodeList-CurrencyCode-1.0.xsd in my post:

   http://lists.oasis-open.org/archives/ubl/200511/msg00064.html

I will review the UBL 1.0 code list meta data names to check to see 
if I messed up.  In one of our phone calls, this was the issue I 
raised for the NDR folks, yet they are waiting from "the code list 
folks" for guidance on what the meta data names should be.  I don't 
consider myself "one of the code list folks" because I wasn't 
involved in the creation of the original set of code list schema 
expressions and their meta data, nor am I familiar with the business 
issues related thereto.  I anticipated having to rewrite the value 
validation specification to accommodate the decisions made for UBL 2 
regarding naming code list meta data information items.  I wrote this 
specification up as a standalone document that works (with requisite 
name changes) however "the code list folks" finally choose to name 
the constructs.  All I proposed in the Ottawa meeting was that a 
mechanism such as I've described here could be useful between trading 
partners who need a formalism for particular sets of code list values 
in a controlled vocabulary and the association of those values with 
document contexts on which they agree.

Thank you very much again for this valuable input into the document, 
Tim ... I'm anxious to hear how we can improve on it to meet 
expectations and requirements.  I look forward to suggestions and 
input from others.

. . . . . . . . . . . . . . Ken

--
Upcoming XSLT/XSL-FO hands-on courses:  Denver,CO March 13-17,2006
World-wide on-site corporate, govt. & user group XML/XSL training.
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/o/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Cancer Awareness Aug'05  http://www.CraneSoftwrights.com/o/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
References:
- Re: [ubl] Code list value validation methodology (version 0.3)
  - From: Tim McGrath <tmcgrath@portcomm.com.au>