ubl message

Subject: Re: [ubl] Re: Namespace URI string implications
From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: ubl@lists.oasis-open.org
Date: Fri, 09 Jun 2006 22:18:53 -0400
At 2006-06-09 13:47 -0700, jon.bosak@sun.com wrote:
>I've just reviewed the messages in this thread, and I have a
>couple of personal observations to make in advance of Ken's
>promised document on the subject.

(finally posted)

>First, I think that it's important to recall the meaning of "minor
>version."  I'm deliberately stating this from memory in order to
>expose the state of my understanding: A minor version is one
>against whose schemas instances conforming to the previous major
>version will continue to validate.  So, for example, a UBL version
>(let's call it 2.1 for purposes of discussion) is a minor version
>if valid UBL 2.0 instances will continue to validate against new
>UBL 2.1 schemas.  Implicit in this definition is the idea that
>minor versions can only contain additions to the previous major
>version; they cannot eliminate any information items or make
>mandatory any information items that were previously declared to
>be optional (though they can make optional items that were
>previously mandatory).  In other words, every minor version schema
>is a superset of the previous major version of that schema.

Everything up to here is totally acceptable to me.

>The
>point of minor versioning is to allow updates to the schemas
>without requiring implementors of the previous major version to
>revise all their software.

But I have a problem with the above ... I believe 
they *do* have to revise all their validation 
processes and software if there are additions to 
the document models and we haven't accommodated 
that in our NDRs for the major version.  I have a 
discussion of this in Section 9.3 of my discussion paper.

In a traditional XML-based system, the instance 
is validated and the program reads the valid 
instance into known structures and acts on the information in those structures.

A UBL 2.0 system, therefore, has 2.0 schemas for 
validation and 2.0 structures built into their 
applications.  Pass to this existing UBL 2.0 
system a new 2.1 instance with 2.1 structures 
added to the document ... say a new sibling 
element at the end of a bunch of children of an 
existing element:  the validation fails because 
it doesn't know of the new child element, and the 
application fails because it has too much 
information with which to load its data structures.

I've proposed we consider that the NDRs trigger 
the schema generation of <xsd:any> with ##any as 
a last child of all elements with element 
children.  This needs experimentation, but with 
this in place in UBL 2.0, a validating process 
and an application built for 2.0 will not choke 
on the presence of a UBL 2.1 construct in the 
instance.  Heterogeneous network validation 
upgrades can proceed in a piecemeal fashion, 
where only the upgraded nodes validate the new 
constructs but the as-yet-to-be-upgraded nodes 
don't choke on the new constructs.

>So the first thing I'd like to observe is that if the appearance
>of a 2.0 namespace URI will prevent a 2.0 document instance from
>validating in an environment expecting a 2.1 document, then there
>can be no such thing as a minor version as we have defined it.

I don't think there is a barrier in the above ... 
since the namespace hasn't changed, and the new 
2.1 information items are optional, a UBL 2.0 
instance should validate without problems with a UBL 2.1 model.

The problem is in the other direction ... a 
heterogeneous network of validation processes or 
a heterogeneous set of applications cannot be 
wholly upgraded en-masse by being taken down to 
change everything before being brought back up 
again with explicit UBL 2.1 support.

>This doesn't just apply to UBL but to every XML vocabulary.  So
>either (a) we and every other XML effort are going to have to
>abandon the concept of minor versioning, or (b) the factor that's
>preventing the 2.0 document from validating in a 2.1 environment
>is a bug in the way that namespaces are implemented, and we're
>going to have to figure out a workaround for it.

I believe the problem is not in that direction, 
but the other way around.  It is the established 
installed base that breaks in the presence of 
2.1, not a new 2.1 installation breaking in the presence of 2.0.

>The second thing I'd like to say is that I personally believe the
>notion of blind interchange to be unrealistic.  I simply cannot
>imagine a real-world business accepting either a purchase order or
>an invoice without some prior out-of-band agreement (even if it's
>only a handshake or a phone conversation).

I've been talking about blind interchange *at the 
program level*, not at the business level.  I 
agree ... I'm not going to send an invoice 
following the (imaginary) Canadian Subset 
extensions to the Danish implementation of the 
North European Subset and expect to be paid just 
because I've sent the instance.  We will have 
agreed that I am allowed to send the instance, 
their systems will be prepared to accept the 
instance, and they would have documented their 
NES and country-specific specializations of the NES.

The serendipity happens when that application of 
theirs can accept my instance without having made 
any changes to the software because it accepts 
the common bits of UBL-standard information items 
and accepts that as sufficient without the 
absolute requirement for the extra levels of 
detail and information found in extensions.

The serendipity I've been thinking about isn't 
the blind "I hope the system processes this 
invoice without knowing about me", but the "lucky 
the system was defined this way that when I send 
my invoice they are expecting from me that they 
don't have to change their software and I don't 
have to change my instance (or my software that created my instance)."

>Common B2C portals
>like amazon.com are not examples of blind interchange, because
>they enforce the input format through generation of the portal
>input forms, and they rely upon payment agreements that are far
>from ad hoc.  If anyone can think of a real-world example of the
>unconstrained blind interchange of a legally binding business
>document, I'd like to hear it.  This seems somehow to have become
>a requirement, but I'm not sure whose it is.

I've been talking about the implementation being 
blind to the extensions in a received instance it 
knows nothing about and being able to process an 
instance without change ... not the blind 
business aspects of "slipping in an invoice undetected."

>Being kind of a simple-minded guy, therefore, I conceive this
>issue in terms of the following scenario and its two basic forms.
>
>Scenario: Company A has implemented 2.1 in software, while company
>B is still at 2.0.  A and B have thought this through together and
>have decided that A can do without 2.1 items in 2.0 instances from
>B and B can ignore the added items in 2.1, thus enabling B to
>avoid a software upgrade.

But B cannot ignore the added items in 2.1 if we 
don't make special accommodations in our NDRs 
that were not present in the January 2006 beta 
release of 2.0 ... and implementations that have 
pre-compiled structures based on the January 2006 
2.0 schemas will reject the loading of XML 
instances that have unexpectedly-structured content.

Without the accommodations, validation fails and programs stop working.

If we experiment with 2.0 having <xsd:any> with 
##any as the last child of every element with 
element children, then 2.0 schemas will 
accommodate 2.1 elements and 2.0 pre-compiled 
structures should be able to ignore the 2.1 elements without error.

And I have proposed in my paper that the new 2.1 
information items have their own 2.1 namespace 
URI string ... we can then detect new constructs 
by their namespace, and users of UBL will be able 
to track down the definition of new constructs by 
knowing in which release they have been 
added.  Every instance will have clearly 
demarcated in each information item from where 
that information item is defined.

>Situation 1: B sends A a 2.0 document.
>
>    Solution for Situation 1: A's input filter peeks at B's
>    document and changes the namespace to 2.1 before processing in
>    order to fool its 2.1 software into handling it.  (By our
>    definition of minor version, a valid 2.0 document is, except
>    for the namespace declaration, also a valid 2.1 document.)  We
>    can characterize this as an XSLT solution if we want, but the
>    fact is that it could be done with sed or perl or even by hand.

None of this should be necessary ... a 2.1 
application already validates and accommodates a 2.0 instance.

>    Note that we already considered this approach when discussing
>    customization two years ago in Hong Kong; from my notes of that
>    session (published to the TC that week as
>    chair-opinion-20040513.pdf):
>
>       Use case 2
>
>        - An XYZ industry profile is developed by defining XYZ
>          schemas that are proper subsets of the UBL 1.0
>          schemas. The definition of “proper subset” is
>          that any valid XYZ instance is also a valid UBL 1.0
>          instance.

Candidate users of UBL indicate they also need to 
be able to add information items ... hence our creation of the extension point.

>        - Action for UBL TC: Because the XYZ instances will carry a
>          non-UBL namespace, we need to (or should) develop a
>          simple technique whereby XYZ instances can be made to
>          look to off-the-shelf UBL 1.0 applications like UBL 1.0
>          instances. Perhaps this could take the form of a
>          configuration file recommended for inclusion in every
>          conformant UBL 1.0 processor that will allow it to
>          recognize that the XYZ namespace is in fact a subset of
>          the UBL 1.0 namespace and substitute the UBL 1.0
>          namespace for the XYZ namespace as the first step in
>          instance processing.

I don't think namespace substitution is the way 
to go ... we lose identity of the constructs.

I do think transformation is critical for 
subsets, because subsets can totally remove an 
optional element and the transformation is needed 
to remove the valid presence of that optional UBL 
element before the subset application, tuned not 
to receive the optional element, gets the instance.

>    Note also that any scenario in which A's input filter can peek
>    at the namespace URI before validating it is a scenario in
>    which it can peek at a version attribute or element before
>    validating it.  So I don't see why the version info has to be
>    in the namespace URI.

I'm not sure about the "peeking" business ... 
because the information items are labelled with 
namespaces, A is going to have to know all 
possible ways of identifying the element if it is 
going to do conditional processing.

>Situation 2: A sends B a 2.1 document.
>
>    Solution for Situation 2: An XSLT filter (or perl script or
>    whatever) at B strips out the information items not in 2.0
>    (thus changing it into something indistinguishable from a 2.0
>    instance) *and* changes the namespace URI back to 2.0 so that
>    B's software can process it.  This is presumably something like
>    what Ken is going to propose to us.

Yes, but since I have proposed that only new UBL 
2.1 information items have the new 2.1 namespace 
URI, and any existing UBL 2.0 information item 
retain its old 2.0 namespace URI, only a 
transformation is going on of removing unexpected 
constructs ... no transliteration between namespaces.

In your scenario, "B" has to know the existence 
of 2.1 before it can accommodate 2.1 by changing 
it to 2.0 ... in my scenario, "B" does not have 
to know the existence of 2.1 because it can 
automatically transform an instance of 2.1 into 
an instance of 2.0 because it is preserving the 
2.0 constructs and eliding other stuff it doesn't 
recognize.  The B user doesn't have to change 
anything until it wants access to 2.1 constructs 
... the installed 2.0 automatically accommodates 
2.1, 2.2, ad infinitum until it wants to.

I'm trying to think here of installations and 
existing software and systems ... I'll let the 
business rules of running business determine when 
software gets the instances, I just want the 
software to work with new instances without any 
changes, or upgrades, or modified transformation filters.

As described in my paper.

>If so, I'd like to
>    recommend that the appropriate XSLT filter be made part of each
>    minor version release.

Even so, it would require installation with every 
minor release ... I don't think we can expect users to accept that.

I think installations would be interested in a 
software system design that only has to be 
changed when you want access to new information 
and does not need to be changed in any way if it 
is already doing what the user wants.

>Note again that if you can believe in
>    an input process that can peep the namespace URI, you can
>    believe that it can just as easily (or darn near as easily)
>    peep a version attribute or element.  So as before, I don't see
>    why the version info has to be in the namespace URI.

I think it would help to have the version info in 
the namespace URI at the divination of a new UBL 
information item, but not change the namespace 
URI of any existing UBL information items.

>The only way I can imagine Situation 2 working in a blind
>interchange environment is if B, upon receiving a 2.1 instance
>from a previously unknown potential partner A, responds with a
>message to the effect that information items beyond those
>specified in 2.0 will be ignored, continue anyway? -- something
>like what you get when you open a current word processor document
>in an old version of the software.  But again, I find it hard to
>imagine this working effectively in real life.

The business side can determine what makes sense 
to send back and forth ... on the technical side, 
I believe my approach is resilient to change, 
forward compatible, and gives implementations the 
luxury of changing when they want to, not when they have to.

I've not seen this approach I've proposed taken 
before, of a "filter only those things I'm 
expecting" making it forward compatible and 
supporting heterogeneous networks of 
installations, and using namespace URI strings 
*at the information item level* ... but I'm quite 
confident it will work.  And I think 
installations will appreciate these features.

I'll leave other comments to discussion of the paper I've posted.

Thanks, Jon, for sharing your thoughts ... please 
continue this thread if you have specific 
questions on my comments above, or phrase your 
concerns in terms of the sections of the paper.

. . . . . . . . . . . . . Ken

p.s. I'm rushing to write this ... please excuse any obvious typos.

--
Registration open for XSLT/XSL-FO training: Wash.,DC 2006-06-12/16
Also for XSL-FO/XSLT/XML training:    Birmingham, UK 2006-07-04/13
Also for XSL-FO/XSLT training:    Minneapolis, MN 2006-07-31/08-04
Also for XML/XSLT/XSL-FO/UBL training: Varo,Denmark 06-09-25/10-06
World-wide corporate, govt. & user group UBL, XSL, & XML training.
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/o/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Cancer Awareness Aug'05  http://www.CraneSoftwrights.com/o/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
Follow-Ups:
- Re: [ubl] Re: Namespace URI string implications
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
References:
- Re: Namespace URI string implications
  - From: jon.bosak@sun.com