ubl-dev message

Subject: Re: [ubl-dev] A personal perspective on considerations for UBL subsets, extensions, versions, validation and interchange
From: "Fraser Goffin" <goffinf@googlemail.com>
To: "G. Ken Holman" <gkholman@cranesoftwrights.com>
Date: Mon, 19 Jun 2006 11:27:40 +0100
Ken,

I've been meaning to feedback some thoughts about my initial read of
your doc (UBL 2.0 subsets, extensions, versions, validation and
interchange) for a week or so now, but as usual got distracted.
Fortunately, I made a few notes, so without re-reading to see if I've
misunderstood somethig (I'm sure you'll point that out ;-) here they
are :-

Section 3.3 - The 'Serendipity Factor'

Final para/ final sentance -  what does '... without authorization or
intervention to prevent misuse', mean ?

Section 3.4 - A pure-XSD expression of constraints is highly desirable

> 'Independently I've had three people comment to me of the importance of
> equipping programmers with XSD expressions of XML document models
> because these programmers never see the angle brackets of XML....'

I also find it desirable to have a pure XSD expression of constraints
although not really because of this reason. In fact, I often think
that this requirement is somewhat over-played and may be reflection of
distinction of those who are really implementing data/object centric
services via an XML vaneer of an existing system API, and those that
are concerned with document/business process led development. Of
course there are cases for both, but personally I am predominantly in
the later camp and although I recognise that probably because of
historical evolution of XML based services the majority may still be
in the former. I think this will change.

I am much more inclined to the view that XML/XSD is a 'first class'
type system and that a preoccupation with mapping to/from other
programmatic type systems is very often un-necessary, inefficient and
sometimes impossible ! I don't happen to think that the APIs for
manipulating XML natively are any more difficult than any other, and
in many cases represent the best fit for purpose tooling available.
I'm sure you are aware of the perma-threads that continue to run on
this subject, many concluding that :-

a. XML <-> Object mapping suffers significant fidelity and impedence
issues (as does XML <-> relational). This typically leads to the need
to only use a compatible subset of XSD types/derivations/content
models.

b. In turning XML into a class hierarchy, flexibility in the face of
change is degraded often leading to brittle, intrusive and expensive
change management (even for what might in some cases be considered as
a minor (non breaking) changes in XML).

Not saying the justification that you have included is wrong or
mis-leading (clearly neither are true given your later statement about
the 'overwhelming feedback'), I just feel that the case might be made
stronger by recognising that this is not everyone's motivation.

Section 5.1  -  UBL Conformant Instances

To be clear, is an instance UBL conformant if no constraint violations
occurs when validating against the FULL UBL schema, a subset UBL
schema, or both ?

Section 5.3 - UBL open systems

(3) seems in conflict with serendipitous exchange ?

Section 7.0 SubSets

> It should be a stated guideline for subsets that any information item that most
> appropriately belongs in the standardized component should go in the
> standardized component and not in the subset extension.

I like that statement a lot. But I am concerned about the governance model.

First, how does UBL keep control of its own standard and ensure (as
far as that is reasonable) that implementers don't abuse the standard
for private 'bastardized' exchange vocabularies using extensibility as
the mechanism for introducing data that UBL either won't sanction or
doesn't include quickly enough (this currently seems to be in part
reliant on NDR and partly on UBL Conformance processes that you
describe later - but is this sufficient, and is it too complicated ?
(in my case this has been the primary reason for the standards body
dis-allowing any extension of the standard even for private data (they
fear over-use of that facility and a consequent deminishment of the
standard) ?

Second, how do implementers ensure that where they expose a service
interface that claims conformance to UBL, that if their trading
partners send non standard stuff they can (should ?) detect and reject
? I assume this is covered by the statement that extension MUST be
within the extension 'area' only and a subset can only validly remove
optional information items (thus producing an instance that is valid
to the superset standard). This makes me think of 2 points :-

1. Does it matter whether structural validation is performed using a
subset schema (ie. one with optionals of no interest removed) or
against the full UBL schema ? (I guess this is really a question about
how to ensure that a subset schema is a valid instance of the
corresponding UBL schema (particularly given the statements later
about 'transform before validate') ?

2. Is it reasonable to assert (insist ?) that implementers shouldn't
invent their own data types/aggregates if one exists in the standard.
If they need something with the same semantics and structure in a
private extension I think they should use the standard. But should
this be explicitly declared as UBL (in the UBL namespace) or should
the process be for implementers to 'borrow' from the standard but use
their own namespace (I think this formed part of your reasoning behind
processContents='skip') ?

Section 7.1 - The choice of XSD for schema expression

2nd from last para :-

> '... a transformation that removes the information items not desirable to the
> subset,..'

Granted, but it might be useful to include something that picks up
David Orchard's distinction of 'Must Ignore Unknown' approaches,
specifically whether 'retain' or 'discard' is used. It is possible
that information items that arrive in an inbound message are part of
the required output, even when they are unused by the receivers
business process (i.e. a sender may send and expect to receive a full
UBL instance). Similarly there may be legal, audit or other regulatory
requirements which require that some items are reflected in request
and responses and/or passed through to upstream processes. I have seen
this point made on the newsgroup in regard to whether exchanges are
based on 'caveat emptor' or 'caveat venditor'.

You might argue that this is simply the process of determining the
subset schemata and filter processing, but it might be worth pointing
out so as readers don't forget ?

Section 7.2. - Subset UBL Conformance

> a subset instance must be UBL-conformant

Also subset schemata presumably ?

First para after bullet points:

> 'A subset schema cannot be used for validation directly ....

Would it be a desirable approach to validate to the FULL UBL schema
BEFORE the filter transform ?. Notwithstanding the subset deployment
recommendations in section 8, if implementers didn't go that far for
whatever reason, or they just got part of it wrong (say the filter
processing was 'buggy' - it might give the appearance of valid UBL,
but it is actually making invalid UBL 'valid' by inadvertantly
removing invalid items), wouldn't it be better to separate out UBL
conformance so that :-

a) a received message can be checked to be fully UBL conformant and if
not rejected as such

b) the filter processing is based ONLY on valid UBL instances. If the
subset validation fails, the reasons can be clearly distinguished (the
message does not conform to the subset schemata/rules, or the filter
is buggy).

Section 8.2.2 - Application handling of an arbitrary instance input

Final para/sentance :

> 'Considering section 5.3, .....

Doesn't this conflict with the idea that an 'open' UBL system should
be able to operate even with extensions and optional items that it can
process, absent. I'm not sure, I'm having a bit of trouble with this
concept. Are we talking about some form of 'fall back' behaviour ?

Figure 4.

Previous comment about the desirability of running validation to full
UBL schema before filter transform ? Do you think this is un-necessary
?

Section 9 - Versions of UBL

Phew, this is interesting, but I'm still mulling it over. A few things
for now :-

- given you are proposing processContents='skip' for extensions, I
think it would be useful to a) explicitly identify that for those who
want to validate data in an extension they need to do something extra
(I didn't feel like this was covered by section 8) and, b) perhaps
provide a description of some of the approaches that could be
considered ?

Section 9.4 - A running example of the proposed version extensibility

ubl2.xsd  -  processContents = 'skip' for 'Extension and
FutureVersions' - still think this could/should be 'lax', but I guess
it somewhat depends on whether UBL want to allow implementers to use
UBL namespaced items in an extension ?

ubl21.xsd  :-
  -  why no Extension within element 'LineItem' ?
  -  the LineItem content model is non deterministic isn't it ? (you
have an optional element (u21:CountryOrigin) declared before
'FutureVersions') ??

Haven't seen any other comments on this doc on this list. Should I be
looking elsewhere ?

Regards

Fraser.


On 10/06/06, G. Ken Holman <gkholman@cranesoftwrights.com> wrote:
> Hello all,
>
> I've posted a publicly-downloadable copy of my personal contribution
> to the UBL discussion of subsets, extensions, versions, validation
> and interchange for UBL 2.0 (note this is version 0.2 ... there may
> be follow-on versions):
>
> http://www.oasis-open.org/committees/download.php/18660/gkholman-ubl-modeling-0.2.zip
>
> If you are reading this post from the archives, check here for the
> latest version, sorted by date, named "gkholman-ubl-modeling-X.Y.zip":
>
> http://www.oasis-open.org/committees/documents.php?num_per_wg=100&wg_abbrev=ubl&sort_field=d1.submission_date
>
>
> Some of the ideas are radical, but I haven't been convinced by
> demonstrative examples of alternative approaches to meet the business
> requirements I've seen.
>
> I do present a pure W3C Schema and XSLT 1.0 approach to the problems,
> without any references to RELAX-NG or NVDL.  Some Schematron is used
> to create the XSLT used in the runtime process.
>
> I welcome suggestions for alternative approaches to those presented
> in my paper, backed up with demonstrative examples of functionality,
> not just quotes from a standards document or somebody else's paper
> ... the TC needs to see working code in order to be convinced
> decisions we are making are good for the long term.
>
> I would be very pleased if simpler solutions were found that meet all
> of the requirements.  If requirements as stated are incorrect, or
> they change, then of course my proposals may not be appropriate.
>
> I'm proposing very late (too late?) changes to the NDR for UBL to
> ensure the schemas we produce next are more resilient to change and
> extension than the draft schemas of January 2006.  I don't believe
> the NDR as they stand meet all our needs.
>
> I am focused on the technical aspects of the problems ... once the
> mechanics are set in place I feel we can then see what policy aspects
> of the problems need to be ... but I don't feel we can make policy
> decisions first and then decide on a technology like W3C XSD Schema
> and its constraints and then try to make something work when that
> technology won't let us do what we need.
>
> I hope this is considered constructive.
>
> . . . . . . . . . . . . . . Ken
>
> p.s. I apologize for how late this is being submitted, but my
> contribution to UBL is entirely volunteer and there are other
> obligations on my time ... I welcome assistance from anyone to
> evaluate and demonstrate alternative simpler solutions to the
> problems.  I will have a business interest in the end result since
> I'll be teaching it, so it is important to me to know if alternatives
> are going to work or not.
>
> --
> Registration open for XSLT/XSL-FO training: Wash.,DC 2006-06-12/16
> Also for XSL-FO/XSLT/XML training:    Birmingham, UK 2006-07-04/13
> Also for XSL-FO/XSLT training:    Minneapolis, MN 2006-07-31/08-04
> Also for XML/XSLT/XSL-FO/UBL training: Varo,Denmark 06-09-25/10-06
> World-wide corporate, govt. & user group UBL, XSL, & XML training.
> G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
> Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/u/
> Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
> Male Cancer Awareness Aug'05  http://www.CraneSoftwrights.com/u/bc
> Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
>
>
> ---------------------------------------------------------------------
> This publicly archived list supports open discussion on implementing the UBL OASIS Standard. To minimize spam in the
> archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Alternately, using email: list-[un]subscribe@lists.oasis-open.org
> List archives: http://lists.oasis-open.org/archives/ubl-dev/
> Committee homepage: http://www.oasis-open.org/committees/ubl/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> Join OASIS: http://www.oasis-open.org/join/
>
>
Follow-Ups:
- Re: [ubl-dev] A personal perspective on considerations for UBL subsets, extensions, versions, validation and interchange
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
References:
- A personal perspective on considerations for UBL subsets, extensions, versions, validation and interchange
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>