[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [ubl-dev] Re: Global elements doing UBL a disservice
On Mon, 29 May 2006, G. Ken Holman wrote: >>At 2006-05-22 07:53 +0800, Chin Chee-Kai wrote: >>..... >>>Help me understand here, but I didn't see how the discussions of >>>Stephen and Joseph and others like David, Fraser, Fulton and perhaps >>>others I've not mentioned, had concluded beyond reasonable doubts >>>(sorry, watching too much of Boston Legal... :) that it is clear >>>that W3C Schema features are not sufficient to the task, as you've >>>put it. It's not so clear to me at least. How did the conclusion >>>get drawn? What exactly was the requirement and in what way is W3C >>>schema inferior to meeting that requirement? >> >>Because all of the W3C mechanisms described up until that point, and >>the ones discovered last week in Belgium end up not being able to >>describe subsets and supersets of the UBL information set precisely >>nor completely. While I don't think W3C schema is meant to describe every sort of dataset, I'm not sure it is a strong argument to say that if insisting W3C schema to describe UBL datasets *in a certain way*, and not realising that it couldn't be done, goes to prove conclusively W3C schema is not "being able to describe subsets and supersets of the UBL information set precisely nor completely". Afterall, any subset or superset of UBL infoset *is* an XML infoset, and properly lies within the value space which can be formally described by XSD (I'll just use XSD in place of W3C schema equivalently). The "in a certain way" above refers to UBL's way of modularising and grouping certain types together in files, trying to extend or restrict types, taking care of namespace preservation etc. While these considerations may be needs of UBL, these translate only to needing XSD get the kind of types in the way that UBL has described. If XSD can describe the finalised type (at run-time) in a more direct way, then it has already performed its task of describing the data set even if it is not necessarily in the way insisted or is debated by UBL right now. >>Three examples that I am putting into a white-paper I am writing on >>this for discussion by the TC: >> >>(1) - if a subset wishes to elide an optional information item, I >>believe there is no way to set the cardinality of a construct to zero >>in a redefine, but this might not be a problem since I'm confused >>about the features and limitations of redefine~ I'd agree that <redefine> isn't meant for any sort of schema component modification; according to the XSD specs, it is supposedly for versioning and modification (by owner of schema) to upgrade or ability to future-refine his or her own schemas. So if present model by UBL (as described by present version of schema) permits certain types or elements to have minOccurs=0, maxOccurs=finite or unbounded, then one certainly could <redefine-restrict> the maxOccurs to 0 to "cut out" the originally optional item. But if originally the item wasn't optional in the first place, XSD's <redefine> tries to ensure integrity by not allowing one to extend the cardinality range (eg changing original's minOccurs=1 to an intended redefining cardinality of minOccurs=0). But more importantly, having read and re-read <redefine> just a few more times to be sure, I'm getting the feel that it is more intended, in this case, for UBL TC to <redefine> future versions of UBL schemas, rather than as a UBL normatively recommended way for end-users to derive customised schemas. >>(2) - if substitution groups or W3C extension techniques quoted add a >>new information item to an existing information item, and this is not >>done in the extension area, then instances of the extended document >>are not validated against the base (which is, I believe, the basis of >>global interoperability) which has the same namespace URI string Sorry but I'm a bit lost in your justaposing of substitution groups with "or W3C extension techniques". Mainly I understand <xsd:substitutionGroup> to be affecting <xsd:element> only, while "or W3C extension techniques" would modify types, and they both operate in rather different manners (eg. as you know, <substitutionGroup> can operate with abstract elements). Also, not sure what you mean with "not done in the extension area". Is that a particular <xsd:anyType> that you're working on, or do you mean <xsd:extension> in general? I just want to make sure I understand if there's a particular observation you're putting across. >>(3) - the extension area cannot be declared as having "all but a set >>of namespaces" for namespace-qualified children, only "all" or "all >>others but this one", which is insufficient (and I was told what I >>need cannot be done by W3C Schema experts in XML-Dev and W3C-Schema >>mail lists) >> >>For example, in RELAX-NG I can say: >> >>element UBLExtension = >> { >> element * - ( in:* | cbc:* | cac:* ) { ........... >> >>This would allow the extension point to have anything outside of >>UBL-defined namespaces. This is not possible in W3C Schema speak. Certainly. Ken, I think we're looking at the problem (I mean "problem" in an academic sense, as business people don't like to hear "problem" :) of deriving customised schema from UBL rather differently. That alone may continue throughout our converstations, but perhaps along the way I can pick up further acute observations from yourself and the schema experts you've spoken to. I suppose if I pick a particular area in XSD specs where it says "cannot" or "disallowed" or "prohibited" and frame my requirements around it, I can easily claim that XSD is deficient in satisfying my requirements. For example, suppose I try to use XSD to describe an infoset that has circular defintions of specific datatypes, or wanting to define a named and typed <xsd:complexType name="xxx" type="xxxType"> but also hoping that should the type not be defined due to absence of a referenced <xsd:import> file, the schame validator would default to a immediately succeeding local type (child of this <xsd:complexType>), and so on... some may be real needs from surrounding project, while some may just be wishful thinking hoping that to avoid another layer of another extra programming. Either way, I suppose examples can be shown about XSD's "weaknesses" in satisfying special needs, but I'd be on the cautious side on taking these examples as reasons to introduce other "patching" language (RELAX-NG, as per your example) to co-describe with XSD the same UBL infoset which is already lying properly in the value space that XSD can fully describe. The complexity, synchronisation, meaning of duality (e.g one says ok, the other says not) and other consequential interpretations more likely than not introduce new, and likely unnecessary, areas of confusion, cost, loopholes, concerns, and delays. >>In my paper I'm proposing using the SBS method of describing subsets >>which can be used to synthesize constraint expressions that people >>may wish to use. Those who want to use RELAX-NG can use it. Those >>who want to use W3C Schema can use it, but will have problems unless >>they layer NVDL on top or a very extensive Schematron expression that >>may be unwieldy. SBS is a nice accessory to UBL as it brings out the useful and common subset. But it doesn't do much with extending schemas. Is the paper you are referring to the much anticipated UBL customisation paper? How would you use SBS to extend schema? >>>It would be a rather surprising conclusion to draw, that UBL >>>requirements are so high that W3C Schema cannot meet the kinds >>>of requirements to describe the desired data sets. >> >>Or that W3C Schema semantics are so low that necessary markup >>patterns needed to meet real-world requirements cannot be expressed >>by the limitations of the language. The RELAX-NG schema language has >>sufficient semantics for the patterns that are needed. W3C Schema >>was developed for program-to-program exchange of inherited type >>information ... it is not sufficiently flexible for the kinds of >>markup patterns that we need. Sure, I suppose the whole 3-4 year of UBL activities were to "upgrade" the world's EDI from EDIFACT to an XML-based version that offers <xsd:import source="advantages.xml"> advantages, it is to describe using XML the abstract value space formerly described by EDIFACT using ASCII strings. Why did, and when have, the structures of the datasets become suddenly so complicated that XSD that can describe SQL database schemas cannot now describe them? I don't really know, and I don't suspect that change had happened. >>>If so, what more >>>would be needed of the expenses on getting the right software, testing >>>for the correct implementation, and the bottom line, allowing more >>>people (including SMEs) to use UBL? >> >>That depends ... if they use pure W3C Schema and pure W3C Schema is >>not up to the task, and reducing UBL to satisfy W3C Schema to the >>point of causing real-world problems in interchange, then they'll >>have to take the risks. At the risk of sounding repetitive, and on the basis that the UBL infosets, customised and uncustomised, lie properly within XML infoset value space and the drawbacks mentioned above that don't justify any clear benefits, there is really no justifiable cause to introduce another description language. >>>Of course, if we design UBL >>>schemas in a way that requires precisely what W3C schema cannot >>>possibly offer, the question, if we allow for simplicity and >>>compactness, then becomes, can the same set of data instances >>>desired by UBL be described with only those facilities offered >>>by W3C schema? >> >>Apparently not ... the experts have told me I cannot do what I want >>to do. I've heard the clamour for a pure W3C-Schema approach and I >>have been agonizingly trying to find just a way to do that, for fear >>of being branded a zealot for pushing unnecessary technologies. Yes, I've seen some of your postings, and struggles. Presumably it boils down to having a decided direction, for UBL, on how to advise end-users with customisations. Should there be no advice and should end-users wish to benefit from UBL's work, they'd just go ahead with their own local customisation methods, some choosing to extend, restrict, based on existing types, some may just pick the elements directly, while others may reconstruct modified data models to directly obtain straight-forward customised types for validation. Whichever way, they can be, and perhaps have been, done. For those local requirements (e.g. semantic verification, uniqueness against in-house database, etc), some may use higher-layer programming, some with richly customised XSD schemas, some with extra gateways, or perhaps some with even RELAX-NG. But it's all implementators' decision based on other non-technical factors as well. >>>I certainly hope we're not at the juncture of seeing the emergence >>>of requirements for other additional data description languages in >>>UBL 2.0. That might be too fast, too soon.... >> >>"Additional yet to be conceived data description languages"? No ... >>there are existing ISO data description languages that work just fine >>... it just happens that W3C Schema expression semantics aren't >>powerful enough for straight-forward means. It would be unfortunate >>to compromise the data integrity to fit a tool, rather than find the >>right tool to fit the data integrity. I like to clarify that the "yet to be conceived" aren't my words. By saying "other additional data description languages", I was referring to your suggestion of using RELAX-NG, and/or other data description languages, and not suggesting that you're inventing new, unknown languages. Sorry if I've not carried my meaning clearly. But I suppose my difference in opinion here is not so much whether the to-be-introduced data description language is conceived already, yet-to-be-conceived, ISO or non-ISO. Much like having 2 processors processing the same data instance would require a lot of care in handling programming logic, I'm feeling the same on the suggestion of the use of 2 parallel data description language in the same infoset. I don't see the impact you're trying to convey, regarding the weakness of XSD expression semantics for straight-forward means. On quite the contrary, it would be the straight-forward direct description of what's the desired infoset in the value space that XSD wouldn't have problem with. You've introduced "data integrity" towards the weakness of XSD. Would that be expanding the accusation on XSD too far? Is there some particular data integrity issues that you have observations of that may jeopardise even normal, uncustomised use of XSD for UBL data instances? >>I'm hoping to soon summarize my ideas so that I can get the opinions >>of the committee. >> >>If the committee decides to loosen up the integrity to allow a pure >>W3C Schema expression of the constraints, then the integrity checking >>moves to user guidelines instead of formal expressions. >> >>I'll teach whatever the committee decides ... but while it is >>deciding I will present opportunities that are available. Ok, thanks in advance for your time, and keep the nice work going. I'm finding much benefits in having such discussions. Best Regards, Chin Chee-Kai SoftML Tel: +65-6820-2979 Fax: +65-6820-2979 Email: cheekai@SoftML.Net http://SoftML.Net/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]