ubl-dev message

Subject: Re: [ubl-dev] A personal perspective on considerations for UBL subsets, extensions, versions, validation and interchange
From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: UBL-Dev <ubl-dev@lists.oasis-open.org>
Date: Mon, 19 Jun 2006 10:52:29 -0400
Good morning, Fraser, thank you for your feedback.

At 2006-06-19 11:27 +0100, Fraser Goffin wrote:
>I've been meaning to feedback some thoughts about my initial read of
>your doc (UBL 2.0 subsets, extensions, versions, validation and
>interchange) for a week or so now, but as usual got distracted.
>Fortunately, I made a few notes, so without re-reading to see if I've
>misunderstood somethig (I'm sure you'll point that out ;-) here they
>are :-

Thanks!  Understandably consideration of this 
feedback did not make it into last night's 
version 0.3 ... sorry I didn't announce it last 
night but it was late on Father's Day and I was 
anxious to turn the computer off.

>Section 3.3 - The 'Serendipity Factor'
>
>Final para/ final sentance -  what does '... without authorization or
>intervention to prevent misuse', mean ?

That what I mean by "serendipity" and "blind 
interchange" is not meant to imply that the 
system design can accept a document out of the 
blue without authorization or out of the blue 
without some intervention by the receiver.  It 
was misinterpreted by early readers that I was 
expecting to design an environment that would 
respond to unexpected and unsolicited document 
interchange ... when in fact I was not describing 
my concept of serendipity from a business 
perspective, rather, only from a system 
perspective.  So, the system design accommodates 
an authorized transaction with another system 
without having to retool, thus making UBL a 
desirable technology to embrace ... not that an 
open UBL system is somehow open to abuse because barriers have been removed.

>Section 3.4 - A pure-XSD expression of constraints is highly desirable
>
>>'Independently I've had three people comment to me of the importance of
>>equipping programmers with XSD expressions of XML document models
>>because these programmers never see the angle brackets of XML....'
>
>I also find it desirable to have a pure XSD expression of constraints
>although not really because of this reason.

Interesting!

>In fact, I often think
>that this requirement is somewhat over-played and may be reflection of
>distinction of those who are really implementing data/object centric
>services via an XML vaneer of an existing system API,

Okay ... I can only go by what I hear from those 
who give me their admonitions of other 
technologies because of their pure-W3C-schema interface to the markup world.

>and those that
>are concerned with document/business process led development.
>Of course there are cases for both, but personally I am predominantly in
>the later camp and although I recognise that probably because of
>historical evolution of XML based services the majority may still be
>in the former. I think this will change.

I'm pleased to hear this.

>I am much more inclined to the view that XML/XSD is a 'first class'
>type system and that a preoccupation with mapping to/from other
>programmatic type systems is very often un-necessary, inefficient and
>sometimes impossible !

Well, that certainly is not what I was told, 
independently, during my 'round-the-world UBL and standards trip last month.

>I don't happen to think that the APIs for
>manipulating XML natively are any more difficult than any other, and
>in many cases represent the best fit for purpose tooling available.
>I'm sure you are aware of the perma-threads that continue to run on
>this subject, many concluding that :-
>
>a. XML <-> Object mapping suffers significant fidelity and impedence
>issues (as does XML <-> relational). This typically leads to the need
>to only use a compatible subset of XSD types/derivations/content
>models.
>
>b. In turning XML into a class hierarchy, flexibility in the face of
>change is degraded often leading to brittle, intrusive and expensive
>change management (even for what might in some cases be considered as
>a minor (non breaking) changes in XML).

Thanks for that summary ... I was not aware of the issues ...

>Not saying the justification that you have included is wrong or
>mis-leading (clearly neither are true given your later statement about
>the 'overwhelming feedback'), I just feel that the case might be made
>stronger by recognising that this is not everyone's motivation.

But it happens to be the motivation expressed by 
sufficient numbers to practically forbid a non-W3C-Schema approach.

Thanks for taking the time to summarize that.

>Section 5.1  -  UBL Conformant Instances
>
>To be clear, is an instance UBL conformant if no constraint violations
>occurs when validating against the FULL UBL schema, a subset UBL
>schema, or both ?

(Remember that the committee hasn't adopted these 
terms, these are just my proposals).

I think "UBL Conformance" is independent of 
"Subset Conformance" and that it will be up to 
subset definers to define what they see as 
conformance to a subset.  So, in this clause, I 
was thinking solely about no constraint violations of the full UBL schema.

>Section 5.3 - UBL open systems
>
>(3) seems in conflict with serendipitous exchange ?

Oh, I was hoping that was fully in support of 
(and equivalent to) serendipitous exchange.  I 
seem unable to convey my thoughts that I'm 
talking solely about the resilience of a system 
to accommodate unexpected inputs by acting on the 
expected content without rejecting the instance 
because of the unexpected content.  Perhaps I should change the wording.

I was hoping the term "UBL Open System" would 
characterize those systems supporting 
serendipitous exchanges, such that retooling 
would not be necessary to accommodate new 
business relationships set up with trading 
partners already using UBL with systems they've 
already created for use in other contexts.

>Section 7.0 SubSets
>
>>It should be a stated guideline for subsets 
>>that any information item that most
>>appropriately belongs in the standardized component should go in the
>>standardized component and not in the subset extension.
>
>I like that statement a lot. But I am concerned about the governance model.

Indeed ... but I thought it was just common sense.

>First, how does UBL keep control of its own standard

I don't see a "controlling" role for the UBL 
technical committee ... since the standard is 
open for anyone to use (and, indeed, every month 
we are hearing of existing deployments of UBL of 
which no-one on the committee was aware) there 
really is no governance.  Only guidance.

The only *normative* components of UBL are the 
schema expressions.  The committee can only 
publish guidelines for use, and with my training 
I am only advocating approaches to using the normative components.

>and ensure (as
>far as that is reasonable) that implementers don't abuse the standard
>for private 'bastardized' exchange vocabularies using extensibility as
>the mechanism for introducing data that UBL either won't sanction or
>doesn't include quickly enough (this currently seems to be in part
>reliant on NDR and partly on UBL Conformance processes that you
>describe later - but is this sufficient, and is it too complicated ?
>(in my case this has been the primary reason for the standards body
>dis-allowing any extension of the standard even for private data (they
>fear over-use of that facility and a consequent deminishment of the
>standard) ?

I see ... well, I suppose that is a risk ... and 
indeed we saw that in the HTML world when there 
were tangents in the early days that were not 
finally reined in until the W3C brought out CSS 
to stop the bastardization that was going on by browser manufacturers.

But hopefully with the extension point we have 
engineered extensibility in such a way that UBL 
can continue to interoperate and when it comes 
time to do a UBL 3.0 we can look at common uses 
of extensions to determine what might be best migrated into the body.

>Second, how do implementers ensure that where they expose a service
>interface that claims conformance to UBL, that if their trading
>partners send non standard stuff they can (should ?) detect and reject
>? I assume this is covered by the statement that extension MUST be
>within the extension 'area' only and a subset can only validly remove
>optional information items (thus producing an instance that is valid
>to the superset standard).

Yes, I was hoping that statement would be 
followed by subset designers, for the reason that 
an instance of the subset is still an instance of full UBL.

>This makes me think of 2 points :-
>
>1. Does it matter whether structural validation is performed using a
>subset schema (ie. one with optionals of no interest removed) or
>against the full UBL schema ? (I guess this is really a question about
>how to ensure that a subset schema is a valid instance of the
>corresponding UBL schema (particularly given the statements later
>about 'transform before validate') ?

Indeed I raised this in my revisions in last 
night's version ... section 8.2.2. "Application 
handling of an arbitrary UBL instance input" 
talks about the option of validating the incoming 
instance against full UBL *before* performing the 
transform-before-validate process so that the 
transformation does not mask true 
non-UBL-conformant instances by making such an 
instance subset-conformant by deleting the "bad stuff".

>2. Is it reasonable to assert (insist ?) that implementers shouldn't
>invent their own data types/aggregates if one exists in the standard.

As a guideline, yes ... if they "break" that rule 
they will have less interoperability with open 
UBL systems, so I'm hoping the common sense 
reason will be sufficient to implementers.

>If they need something with the same semantics and structure in a
>private extension I think they should use the standard. But should
>this be explicitly declared as UBL (in the UBL namespace) or should
>the process be for implementers to 'borrow' from the standard but use
>their own namespace (I think this formed part of your reasoning behind
>processContents='skip') ?

Extensions are allowed to utilize UBL constructs 
in the expression of the extension semantic 
(provided the apex of the extension, that being 
the child of the UBL extension point, is itself 
not a UBL construct) because the ancestral 
labeling of whatever is using the UBL constructs 
indicates a different purpose.  I would hope, for 
example, that an extension definition of a new 
party would exploit the existing party definition 
... indeed it would not make sense (though I 
admit that isn't a very strong constraint) for a 
subset definer to define a new party construct 
when all they need is a new party parent and use the existing party construct.


>Section 7.1 - The choice of XSD for schema expression
>
>2nd from last para :-
>
>>'... a transformation that removes the information items not desirable to the
>>subset,..'
>
>Granted, but it might be useful to include something that picks up
>David Orchard's distinction of 'Must Ignore Unknown' approaches,
>specifically whether 'retain' or 'discard' is used. It is possible
>that information items that arrive in an inbound message are part of
>the required output, even when they are unused by the receivers
>business process (i.e. a sender may send and expect to receive a full
>UBL instance).

Well, I can understand the "send", but if the 
sender is dealing with a subset user, then I 
would not expect the sender to expect to receive 
a full UBL instance since the other party isn't a full UBL system.

>Similarly there may be legal, audit or other regulatory
>requirements which require that some items are reflected in request
>and responses and/or passed through to upstream processes.

How would these be described, and wouldn't the 
requirements be so arbitrary as to be inexpressible in a declarative format?

>I have seen
>this point made on the newsgroup in regard to whether exchanges are
>based on 'caveat emptor' or 'caveat venditor'.
>
>You might argue that this is simply the process of determining the
>subset schemata and filter processing, but it might be worth pointing
>out so as readers don't forget ?

Could you clarify the point to make?  That 
senders *cannot* expect anything beyond the fence 
(now described in last night's section 7.2 "The 
subset fence")?  Perhaps my new focus on that 
subject is sufficient ... please let me know.  I 
hope that I was able to address your concerns here in what I posted last night.

>Section 7.2. - Subset UBL Conformance
>
>>a subset instance must be UBL-conformant
>
>Also subset schemata presumably ?

Yes, that was implied but I see perhaps I should have stated that explicitly.

>First para after bullet points:
>
>>'A subset schema cannot be used for validation directly ....
>
>Would it be a desirable approach to validate to the FULL UBL schema
>BEFORE the filter transform ?.

Indeed I had anticipated this question based on 
more analysis of my data flow diagrams and 
explicitly call this out in Figure 4. "Subset 
handling of arbitrary UBL instance input".

>Notwithstanding the subset deployment
>recommendations in section 8, if implementers didn't go that far for
>whatever reason, or they just got part of it wrong (say the filter
>processing was 'buggy' - it might give the appearance of valid UBL,
>but it is actually making invalid UBL 'valid' by inadvertantly
>removing invalid items), wouldn't it be better to separate out UBL
>conformance so that :-
>
>a) a received message can be checked to be fully UBL conformant and if
>not rejected as such
>
>b) the filter processing is based ONLY on valid UBL instances. If the
>subset validation fails, the reasons can be clearly distinguished (the
>message does not conform to the subset schemata/rules, or the filter
>is buggy).

Right ... which is why I've added it in ... I 
realized there was the opportunity for "false 
positives" because the transform-before-validate 
transformation could hide UBL inconsistencies.

But note that I've still made it optional (though 
recommended) given that implementers can choose 
how complex to make their systems.  Their choice 
of implementation level will dictate how complete 
and interoperable their systems are.

This, again, was driven into me during my 
trip:  "make UBL easy to deploy" ... but my 
rebuttal of "but an easily-deployed system will 
have drawbacks in interoperability and error detection" was disregarded.

So, you will see in effect three levels of 
implementation from easiest (least overhead) to hardest (most overhead):

(1) - Figure 3. "Subset handling of pure subset instance input"
(2) - Figure 4. "Subset handling of arbitrary UBL 
instance input" without full UBL checking
(3) - Figure 4. "Subset handling of arbitrary UBL 
instance input" with full UBL checking

>Section 8.2.2 - Application handling of an arbitrary instance input
>
>Final para/sentance :
>
>>'Considering section 5.3, .....
>
>Doesn't this conflict with the idea that an 'open' UBL system should
>be able to operate even with extensions and optional items that it can
>process, absent. I'm not sure, I'm having a bit of trouble with this
>concept. Are we talking about some form of 'fall back' behaviour ?

No, I'm trying to convey that a "UBL Open system" 
is one that implements transform-before-validate 
... thus accepting any instance of UBL because by 
the time the data reaches the application there 
is only the data that the application expects and 
not any data that the application doesn't expect.

>Figure 4.
>
>Previous comment about the desirability of running validation to full
>UBL schema before filter transform ? Do you think this is un-necessary
>?

I think that mandating it would leave the 
impression the system was too complex, but that 
recommending it would give implementations an 
extra level of conformance checking and would 
prevent the false positives from getting through.

Please let it be known if you think I should 
change those dotted lines into solid lines and 
mandate the UBL schema checking ... I can 
certainly live with that, and indeed would prefer 
it, but I'm trying to be accommodating by making 
it optional.  Perhaps that is too accommodating.

>Section 9 - Versions of UBL
>
>Phew, this is interesting, but I'm still mulling it over. A few things
>for now :-
>
>- given you are proposing processContents='skip' for extensions, I
>think it would be useful to a) explicitly identify that for those who
>want to validate data in an extension they need to do something extra
>(I didn't feel like this was covered by section 8) and, b) perhaps
>provide a description of some of the approaches that could be
>considered ?

Does the revised 8.1.2. "Subset supplemental 
validation artefact preparation" section now 
underscore the role of subset business rules?  My 
exemplar is the correlation of detailed 
line-item-level information in the extension with 
the summary line-item-level information in standardized UBL.

>Section 9.4 - A running example of the proposed version extensibility
>
>ubl2.xsd  -  processContents = 'skip' for 'Extension and
>FutureVersions' - still think this could/should be 'lax', but I guess
>it somewhat depends on whether UBL want to allow implementers to use
>UBL namespaced items in an extension ?

I remain unconvinced. :{)}  I am so worried that 
using "lax" will trip up false negatives by 
dictating to subset definers what their subsets 
should look like, when it should be totally up to the subset definers.

>ubl21.xsd  :-
>  -  why no Extension within element 'LineItem' ?

I supported this (and for party):

  http://lists.oasis-open.org/archives/ubl/200604/msg00010.html

But it was decided to have only one extension point:

  http://lists.oasis-open.org/archives/ubl/200604/msg00027.html (end of post)

And by now I'm convinced there should be only one extension point.

>  -  the LineItem content model is non deterministic isn't it ? (you
>have an optional element (u21:CountryOrigin) declared before
>'FutureVersions') ??

I believe it would be non-deterministic the other 
way, but not this way.  The particles are easily 
distinguished if the UBL 2.1 items are before 
extensions ... except for misspelled UBL 2.1 
items that end up falling under the version extension.

>Haven't seen any other comments on this doc on this list. Should I be
>looking elsewhere ?

No, I'm afraid that no-one else has taken the 
time to comment ... not even in 
committee.  Though I acknowledge it is a very 
long document to digest, I would not have written 
it all if I did not think we were at the most 
very crucial stage of making such decisions 
before casting our next set of schemas (which are 
candidates for being final schemas).

I do appreciate that you took the time to post 
something, Fraser, thank you!  I hope others will 
share their thoughts soon based on my post of last night.

. . . . . . . . Ken


--
Registration open for UBL training:    Montr�al, Canada 2006-08-07
Also for XSL-FO/XSLT training:    Minneapolis, MN 2006-07-31/08-04
Also for UBL/XML/XSLT/XSL-FO training: Varo,Denmark 06-09-25/10-06
World-wide corporate, govt. & user group UBL, XSL, & XML training.
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/u/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Cancer Awareness Aug'05  http://www.CraneSoftwrights.com/u/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
References:
- A personal perspective on considerations for UBL subsets, extensions, versions, validation and interchange
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
- Re: [ubl-dev] A personal perspective on considerations for UBL subsets, extensions, versions, validation and interchange
  - From: "Fraser Goffin" <goffinf@googlemail.com>