ubl-dev message

Subject: Re: [xml-dev] Still banging on about extensibility and validation
From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: "Fraser Goffin" <goffinf@googlemail.com>
Date: Wed, 03 May 2006 22:09:45 -0400
At 2006-05-03 18:28 +0100, Fraser Goffin wrote:
>Ken,
>
>a while back you made this interesting observation as part of a larger
>conversation about message validation :-
>
>>There are some who hold with a traditional view that the entire
>instance *has a model*, rather than the different view that sets of
>labeled information found in an instance *each have their own model*.
>And those sets are identified unambiguously through the use of
>namespace-rich labels.
>
>>Accommodating "the entire XML instance has a model" is, I believe,
>more difficult, time consuming and frustrating than accommodating
>"each set of information found in an XML instance has its own model".
>
>I really want to believe this (in the practical sense of it being
>implementable - now)

I believe it is.  A new open-source implementation of ISO/IEC 19757-4 
NVDL has been announced and I added reference to it to the 
http://www.nvdl.org home page.  I have been told of others being in 
the works.  I have already added it to my training material and will 
be teaching it this summer in an XML document modeling introduction 
at a conference.

>but there are a few pieces missing from my
>understanding which are still troubling me, perhaps you (and others)
>can help me out (the recent UBL SBS discussions are also surfacing
>some really interesting commentary).

(I agree ... about the commentary, not about you having any missing pieces!)

>Anyway, as you had mentioned to me earlier wrt UBL code list value
>validation, the validation processing *must* first test that a message
>is compliant to the structural model (i.e. schema - maybe) before
>value-based validation is attempted. I think the reason was/is to
>confirm that the expected locations for values are present ?

Indeed ... otherwise the second test is meaningless.  If an assertion 
in the second test doesn't fail because an expected information item 
is in the wrong place, you would get a "false positive".

>Also, that UBL schema do *not* use the xs:any or xs:anyAttribute
>wildcard mechanisms for extensibility,

I can now say *did not* because recent work concluded that it will be 
allowed in UBL 2 with a new element called UBLExtension:

   http://lists.oasis-open.org/archives/ubl/200604/msg00077.html

>and in one sense UBL does not
>support structural extensibility by trading partners (although
>restriction is). This may be not entirely accurate so 'flame away'.

I think it more accurate to say that W3C Schema redefine does not 
allow what is needed.

>It may also be a bit unfair of me to put this sort of question to you
>directly since you have already said you're really involved in the
>code list stuff for UBL rather than the general schema work - so apols
>for that :-)

Not a problem ... I am taking a shared lead position in both task 
groups (which unfortunately puts my HISC work that I chair at risk 
for output forms ... I co-chair the SBSC work with Stephen Green who 
does the heavy lifting for that subcommittee)

>So, this got me thinking about how, on the one hand we want to enable
>trading partners to not be constrained in terms of any additional
>'private' information items they may want to exchange, whilst at the
>same time benefit from the broader reach of standards compliance.

Now private information can be placed in the UBLExtension element, 
provided the immediate children of the element are not in the UBL 
namespace (though other descendants may be).

And I suggested that we put this extension element at the very top of 
the instance so that applications using stream-oriented APIs such as 
SAX will be aware of the presence of any extensions before they have 
to deal with any standardized content.

Note that we have provided for only a single point of extension for 
every instance.

>If structural conformance is demonstrated by validating message
>instances to UBL schema (I mean actual XSDs - for the moment), then
>does that mean that :-
>
>a. there is no possibility for trading partners to 'insert' any
>private data into a UBL specified content model (as foreign namespaced
>items) even if that context make the most sense, since the message
>would then be schema invalid ?

Now there is ... with UBLExtension any private use under that element 
will pass structural validation.  And with my proposed use of NVDL, 
the children underneath and their descendants can be despatched for 
validation with their own schemas.

>b. if the approach is to validate aspects of the message rather than
>the message as a whole, what does that mean in terms of a message that
>is a composite of a number of a business entity based schema, doesn't
>the context of where these individual parts exist in the overall
>message typically lend as much to validation as the individual part
>itself ?

I believe context does, but when weighed against complexity, I was 
voted down, thus resulting in only the one point of extension.  I 
contended other useful points of extension would be with parties and 
with line items, thereby using context to convey information about 
the extension, but this is not being allowed.  When using the 
extension, the extension designer must find a way to associate their 
information items with the information items in the standardized locations.

>c. if vocabularies are best developed in a way which allows
>information items that are not part of its specification to be 'safely
>ignored' (this has been a bit of a theme that has been coming through
>recently), then is that really saying that traditional methods of
>validating messages (i.e. validating parsers which load up XSDs) won't
>really work, we need to move to validation via positional/context
>based rules (XPath) ?

Or, I think more appropriately, NVDL, since we are still talking 
structural constraints.  Positional/context based rules using XPath 
files or ISO/IEC 19757-3 Schematron or other assertions are, I 
believe business constraints and coded value constraints, not 
structural constraints.  Structural schemas are still, I believe, 
most appropriate for structural constraints.

>d. what about NVDL, (or possibly CAM and/or other methods that are
>being talked about).

I think ISO/IEC 19757-4 NVDL is the mechanism by which we can safely 
look at XML instances using the view that sets of labeled information 
found in a single instance *each have their own model* ... those sets 
identified unambiguously through the use of namespace-rich 
labels.  This is not achievable with the traditional view that the 
entire instance *has a single model* that is sacrosanct.  The real 
world does not accommodate this traditional view well when trying to 
accommodate different users' needs.

I cannot comment on CAM as there are just not enough hours in the day 
to be following everything that is out there and I am not equipped to 
provide a judgement on it or to explore its applicability.  David 
Webber is the CAM expert, not me, and I respectfully defer to him for 
any CAM judgement.  In my sphere of influence as the ISO/IEC JTC 1/SC 
34 Secretariat Manager, I am far more focused on the ISO standards.

I hope this helps.

. . . . . . . . . . . . Ken

--
Registration open for XSLT/XSL-FO training: Wash.,DC 2006-06-12/16
Also for XSLT/XSL-FO training:    Minneapolis, MN 2006-07-31/08-04
Also for XML/XSLT/XSL-FO training:Birmingham,England 2006-05-22/25
Also for XSLT/XSL-FO training:    Copenhagen,Denmark 2006-05-08/11
World-wide on-site corporate, govt. & user group XML/XSL training.
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/u/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Cancer Awareness Aug'05  http://www.CraneSoftwrights.com/u/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
References:
- Still banging on about extensibility and validation
  - From: "Fraser Goffin" <goffinf@googlemail.com>