ubl-dev message

Subject: Re: [ubl-dev] Re: Global elements doing UBL a disservice
From: Chin Chee-Kai <cheekai@softml.net>
To: UBL-Dev <ubl-dev@lists.oasis-open.org>
Date: Tue, 30 May 2006 12:01:30 +0800 (SGT)
On Mon, 29 May 2006, G. Ken Holman wrote:

>>At 2006-05-22 07:53 +0800, Chin Chee-Kai wrote:
>>.....
>>>Help me understand here, but I didn't see how the discussions of
>>>Stephen and Joseph and others like David, Fraser, Fulton and perhaps
>>>others I've not mentioned, had concluded beyond reasonable doubts
>>>(sorry, watching too much of Boston Legal... :)  that it is clear
>>>that W3C Schema features are not sufficient to the task, as you've
>>>put it.   It's not so clear to me at least.  How did the conclusion
>>>get drawn?  What exactly was the requirement and in what way is W3C
>>>schema inferior to meeting that requirement?
>>
>>Because all of the W3C mechanisms described up until that point, and
>>the ones discovered last week in Belgium end up not being able to
>>describe subsets and supersets of the UBL information set precisely
>>nor completely.

While I don't think W3C schema is meant to describe every sort of
dataset, I'm not sure it is a strong argument to say that if insisting 
W3C schema to describe UBL datasets *in a certain way*, and not realising
that it couldn't be done, goes to prove conclusively W3C schema is not
"being able to describe subsets and supersets of the UBL information 
set precisely nor completely".  Afterall, any subset or superset of UBL
infoset *is* an XML infoset, and properly lies within the value space
which can be formally described by XSD (I'll just use XSD in place of
W3C schema equivalently).

The "in a certain way" above refers to UBL's way of modularising and
grouping certain types together in files, trying to extend or restrict
types, taking care of namespace preservation etc.  While these
considerations may be needs of UBL, these translate only to needing
XSD get the kind of types in the way that UBL has described.  If XSD
can describe the finalised type (at run-time) in a more direct way,
then it has already performed its task of describing the data set
even if it is not necessarily in the way insisted or is debated by UBL
right now.



>>Three examples that I am putting into a white-paper I am writing on
>>this for discussion by the TC:
>>
>>(1) - if a subset wishes to elide an optional information item, I
>>believe there is no way to set the cardinality of a construct to zero
>>in a redefine, but this might not be a problem since I'm confused
>>about the features and limitations of redefine~

I'd agree that <redefine> isn't meant for any sort of schema
component modification;  according to the XSD specs, it is supposedly
for versioning and modification (by owner of schema) to upgrade
or ability to future-refine his or her own schemas.  So if present
model by UBL (as described by present version of schema) permits certain
types or elements to have minOccurs=0, maxOccurs=finite or unbounded,
then one certainly could <redefine-restrict> the maxOccurs to 0
to "cut out" the originally optional item.  But if originally
the item wasn't optional in the first place, XSD's <redefine>
tries to ensure integrity by not allowing one to extend the cardinality
range (eg changing original's minOccurs=1 to an intended redefining
cardinality of minOccurs=0).

But more importantly, having read and re-read <redefine> just a few
more times to be sure, I'm getting the feel that it is more intended,
in this case, for UBL TC to <redefine> future versions of UBL schemas,
rather than as a UBL normatively recommended way for end-users to
derive customised schemas.



>>(2) - if substitution groups or W3C extension techniques quoted add a
>>new information item to an existing information item, and this is not
>>done in the extension area, then instances of the extended document
>>are not validated against the base (which is, I believe, the basis of
>>global interoperability) which has the same namespace URI string

Sorry but I'm a bit lost in your justaposing of substitution groups 
with "or W3C extension techniques".  Mainly I understand
<xsd:substitutionGroup> to be affecting <xsd:element> only, while
"or W3C extension techniques" would modify types, and they both
operate in rather different manners (eg. as you know, 
<substitutionGroup> can operate with abstract elements).

Also, not sure what you mean with "not done in the extension area".
Is that a particular <xsd:anyType> that you're working on, or do you
mean <xsd:extension> in general?   I just want to make sure I understand
if there's a particular observation you're putting across.




>>(3) - the extension area cannot be declared as having "all but a set
>>of namespaces" for namespace-qualified children, only "all" or "all
>>others but this one", which is insufficient (and I was told what I
>>need cannot be done by W3C Schema experts in XML-Dev and W3C-Schema 
>>mail lists)
>>
>>For example, in RELAX-NG I can say:
>>
>>element UBLExtension =
>>  {
>>   element * - ( in:* | cbc:* | cac:* ) { ...........
>>
>>This would allow the extension point to have anything outside of
>>UBL-defined namespaces.  This is not possible in W3C Schema speak.

Certainly.  Ken, I think we're looking at the problem (I mean "problem"
in an academic sense, as business people don't like to hear "problem" :)
of deriving customised schema from UBL rather differently.  That alone
may continue throughout our converstations, but perhaps along the way 
I can pick up further acute observations from yourself and the schema
experts you've spoken to.

I suppose if I pick a particular area in XSD specs where it says
"cannot" or "disallowed" or "prohibited" and frame my requirements 
around it, I can easily claim that XSD is deficient in satisfying 
my requirements.

For example, suppose I try to use XSD to describe an infoset that
has circular defintions of specific datatypes, or wanting to 
define a named and typed <xsd:complexType name="xxx" type="xxxType">
but also hoping that should the type not be defined due to absence
of a referenced <xsd:import> file, the schame validator would
default to a immediately succeeding local type (child of
this <xsd:complexType>), and so on...    some may be real needs from 
surrounding project, while some may just be wishful thinking hoping
that to avoid another layer of another extra programming.

Either way, I suppose examples can be shown about XSD's "weaknesses"
in satisfying special needs, but I'd be on the cautious side on
taking these examples as reasons to introduce other "patching"
language (RELAX-NG, as per your example) to co-describe with XSD
the same UBL infoset which is already lying properly in the
value space that XSD can fully describe.   The complexity,
synchronisation, meaning of duality (e.g one says ok, the other says
not) and other consequential interpretations more likely than not
introduce new, and likely unnecessary, areas of confusion, cost,
loopholes, concerns, and delays.




>>In my paper I'm proposing using the SBS method of describing subsets
>>which can be used to synthesize constraint expressions that people
>>may wish to use.  Those who want to use RELAX-NG can use it.  Those
>>who want to use W3C Schema can use it, but will have problems unless
>>they layer NVDL on top or a very extensive Schematron expression that
>>may be unwieldy.

SBS is a nice accessory to UBL as it brings out the useful and common
subset.  But it doesn't do much with extending schemas.  

Is the paper you are referring to the much anticipated UBL customisation
paper?  How would you use SBS to extend schema?


>>>It would be a rather surprising conclusion to draw, that UBL
>>>requirements are so high that W3C Schema cannot meet the kinds
>>>of requirements to describe the desired data sets.
>>
>>Or that W3C Schema semantics are so low that necessary markup
>>patterns needed to meet real-world requirements cannot be expressed
>>by the limitations of the language.  The RELAX-NG schema language has
>>sufficient semantics for the patterns that are needed.  W3C Schema
>>was developed for program-to-program exchange of inherited type
>>information ... it is not sufficiently flexible for the kinds of
>>markup patterns that we need.

Sure, I suppose the whole 3-4 year of UBL activities were to "upgrade"
the world's EDI from EDIFACT to an XML-based version that offers
<xsd:import source="advantages.xml">  advantages, it is to describe
using XML the abstract value space formerly described by EDIFACT using
ASCII strings.  Why did, and when have, the structures of the datasets
become suddenly so complicated that XSD that can describe SQL database
schemas cannot now describe them?  I don't really know, and I don't
suspect that change had happened.




>>>If so, what more
>>>would be needed of the expenses on getting the right software, testing
>>>for the correct implementation, and the bottom line, allowing more
>>>people (including SMEs) to use UBL?
>>
>>That depends ... if they use pure W3C Schema and pure W3C Schema is
>>not up to the task, and reducing UBL to satisfy W3C Schema to the
>>point of causing real-world problems in interchange, then they'll
>>have to take the risks.

At the risk of sounding repetitive, and on the basis that the UBL
infosets, customised and uncustomised, lie properly within XML infoset
value space and the drawbacks mentioned above that don't justify
any clear benefits, there is really no justifiable cause to introduce
another description language.




>>>Of course, if we design UBL
>>>schemas in a way that requires precisely what W3C schema cannot
>>>possibly offer, the question, if we allow for simplicity and
>>>compactness, then becomes, can the same set of data instances
>>>desired by UBL be described with only those facilities offered
>>>by W3C schema?
>>
>>Apparently not ... the experts have told me I cannot do what I want
>>to do.  I've heard the clamour for a pure W3C-Schema approach and I
>>have been agonizingly trying to find just a way to do that, for fear
>>of being branded a zealot for pushing unnecessary technologies.

Yes, I've seen some of your postings, and struggles.

Presumably it boils down to having a decided direction, for UBL, on 
how to advise end-users with customisations.  Should there be no 
advice and should end-users wish to benefit from UBL's work, they'd 
just go ahead with their own local customisation methods, some choosing
to extend, restrict, based on existing types, some may just pick the
elements directly, while others may reconstruct modified data models
to directly obtain straight-forward customised types for validation.

Whichever way, they can be, and perhaps have been, done.  For those
local requirements (e.g. semantic verification, uniqueness against
in-house database, etc), some may use higher-layer programming, 
some with richly customised XSD schemas, some with extra gateways, 
or perhaps some with even RELAX-NG.  But it's all implementators'
decision based on other non-technical factors as well.




>>>I certainly hope we're not at the juncture of seeing the emergence
>>>of requirements for other additional data description languages in
>>>UBL 2.0.  That might be too fast, too soon....
>>
>>"Additional yet to be conceived data description languages"?  No ...
>>there are existing ISO data description languages that work just fine
>>... it just happens that W3C Schema expression semantics aren't
>>powerful enough for straight-forward means.  It would be unfortunate
>>to compromise the data integrity to fit a tool, rather than find the
>>right tool to fit the data integrity.

I like to clarify that the "yet to be conceived" aren't my words.
By saying "other additional data description languages", I was referring
to your suggestion of using RELAX-NG, and/or other data description
languages, and not suggesting that you're inventing new, unknown
languages.  Sorry if I've not carried my meaning clearly.  

But I suppose my difference in opinion here is not so much whether
the to-be-introduced data description language is conceived already,
yet-to-be-conceived, ISO or non-ISO.  Much like having 2 processors
processing the same data instance would require a lot of care in
handling programming logic, I'm feeling the same on the suggestion
of the use of 2 parallel data description language in the same infoset.

I don't see the impact you're trying to convey, regarding
the weakness of XSD expression semantics for straight-forward means.
On quite the contrary, it would be the straight-forward direct 
description of what's the desired infoset in the value space
that XSD wouldn't have problem with.

You've introduced "data integrity" towards the weakness of XSD.
Would that be expanding the accusation on XSD too far?  Is there
some particular data integrity issues that you have observations of
that may jeopardise even normal, uncustomised use of XSD for
UBL data instances?



>>I'm hoping to soon summarize my ideas so that I can get the opinions
>>of the committee.
>>
>>If the committee decides to loosen up the integrity to allow a pure
>>W3C Schema expression of the constraints, then the integrity checking
>>moves to user guidelines instead of formal expressions.
>>
>>I'll teach whatever the committee decides ... but while it is
>>deciding I will present opportunities that are available.

Ok, thanks in advance for your time, and keep the nice work going.
I'm finding much benefits in having such discussions.



Best Regards,
Chin Chee-Kai
SoftML
Tel: +65-6820-2979
Fax: +65-6820-2979
Email: cheekai@SoftML.Net
http://SoftML.Net/
References:
- Re: [ubl-dev] Re: Global elements doing UBL a disservice
  - From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>