obix-xml message

Subject: What makes a good schema?

From: "Considine, Toby (Facilities Technology Office)" <Toby.Considine@fac.unc.edu>
To: "'obix-xml@lists.oasis-open.org'" <obix-xml@lists.oasis-open.org>
Date: Sat, 26 Jun 2004 10:57:55 -0400

Some in this SC are old XML pros

Some are Controls guys who recognize a gt and lt when they see them

Unfortunately, I am closer to the latter.

The UDDI group has recently begun a conversation with us (oBIX) wondering what UDDI might need to do to accomodate oBIX. THis has induced me to burrow more deeply into UDDI than I had previously. I found the following document on what makes a good schema, published by the UDDI group, very thought provoking.

http://www.uddi.org/pubs/SchemaCentricCanonicalization-20020710.htm

On a similar note, I have long been concerned about how oBIX relates to the XML spaces around it. A seasoned practioner of any engineering or software trade learns to recognize a good solution within that craft, even before necessarily exploring all the details of that solution. To this end, I have been pondering how does one recognize a good schema? What heuristics does a godd schema have over a less good one?

I think that one heuristic might be that a good schema can be easily transformed into and out of the XML spaces that surround it. Mechanistically, a good schema is one for which one can easily (or relatively easily) develop an XSLT for. Fortunately, we have an XSLT conformance TC here in OASIS, so I asked them. I got a couple good answers that I wanted to share with this SC.

===================================================
From: david_marston@us.ibm.com [mailto:david_marston@us.ibm.com]
Sent: Thursday, June 10, 2004 10:44 AM
To: Considine, Toby (Facilities Technology Office)
Cc: xslt-conformance@lists.oasis-open.org
Subject: Re: [xslt-conformance] Introduction and plea for advice

>I am pondering the mirror image question; what makes an XML dialect fit for a good XSLT?

XSLT is designed to be very flexible about both extracting data from nodes and producing output nodes holding the requested data. The main design point I see is that individual data values should be kept separate, to minimize the use of functions like substring() to pull values apart. You can see that design point reflected in our design for the catalog of test cases; that's why items like file-path and file-name are separate. Beyond that, you should just follow the good practices for element vs. attribute, namespace usage, etc.

If you plan to have an XML Schema for your vocabulary, you will have to discuss data types. XSLT 2.0 is designed to harmonize with those types (see the W3C Schema Part 2 document) and probably doesn't impose any additional constraints.

>Is it a measure of the quality of an XML dialect that it can be ... translated into another dialect?

Definitely. It's surprising how often you need to perform transformations.
.................David Marston
(speaking for himself, not the TC)

===================================================

From: Zarella Rendon [mailto:zarella@xml-factor.com]
Sent: Thursday, June 10, 2004 12:45 PM
To: Considine, Toby (Facilities Technology Office)
Subject: RE: [xslt-conformance] Introduction and plea for advice

Hi Toby,

I've been doing transformations using many different software packages for years. The only issue I've found with XSLT1.0 involves grouping. If your data does not have enough levels of parent element wrappers, sibling elements of different types may be hard to access and process individually. This issue has been addressed and fixed with grouping in XSLT2.0. However, if you have control over your schema, it doesn't hurt to do some analysis to see if you have structures that would benefit from additional wrappers. Here's an example:

<note/>

</wrapper>

In this example, <item1> is really the first item in a group, so a group wrapper would help the transformation process.

Hope this helps.

--------------------------

Zarella Rendon

Managing Director

XML-Factor, Inc.

www.xml-factor.com

-----Original Message-----
From: Considine, Toby (Facilities Technology Office) [mailto:Toby.Considine@fac.unc.edu]
Sent: Thursday, June 10, 2004 10:25 AM
To: 'david_marston@us.ibm.com'
Cc: xslt-conformance@lists.oasis-open.org
Subject: RE: [xslt-conformance] Introduction and plea for advice

Thanks David.

I will do some reading based upon your response, and would welcome any urls you can send to broaden my education. A related question is "If you knew the three vocabularies you needed to transform to, how would you go about evaluating different proposed schemas inside your own dialect." This knowlege makes the question different then the generic be ready to transdorm to anything, which would suggest schema that are fully normalized in a way that few schemas I have seen are.

In the last 20 years, I have transformed a lot of data into other data, but I haven't sat down and formally though about what structures would be pre-adapted for ease and completeness of transformation - well except in the usual dba formal normalization kind of way; and I'm not sure that those reflexes are entirely usefull for this scenario.

I'm thinking that there is a more generic rules that suggests that a quality XML schema, that is well though out, can be recognized by the ease with which one can generate a working XSLT - and that this issue is related to what one wrestles with in XSLT conformance.

tc

===================================================

From: david_marston@us.ibm.com [mailto:david_marston@us.ibm.com]
Sent: Thursday, June 10, 2004 4:30 PM
To: Considine, Toby (Facilities Technology Office)
Cc: xslt-conformance@lists.oasis-open.org
Subject: RE: [xslt-conformance] Introduction and plea for advice

>A related question is "If you knew the three vocabularies you needed to transform to, how would you go about evaluating different proposed schemas inside your own dialect." This knowlege makes the question different then the generic be ready to transdorm to anything...

That's true, and I have been working on some troublesome XML structures. I may be able to report some specific experiences later this year. A typical problem I encounter is when I am transforming a specific sub-tree and some value is essentially a cross-reference that needs to be looked up in another tree. If the data can't be right there in the sub-tree, then it should be in a "side tree" as high up in the whole document as possible. For example...
Worst case - given the $ItemID, look it up on the Part list for this Page (and I don't know how deep down I am, or how deep the Part list is):
select="ancestor::Page[1]//Part[@ID=$ItemID]/Text"
Not as bad case - given the $ItemID, look it up on the Part list for this Page (which is my grandparent node, and the Part list is its child):
select="../../Part[@ID=$ItemID]/Text"
Better - given the $ItemID, look it up on the universal Part list:
select="/Part[@ID=$ItemID]/Text"
so you can evaluate the design (schema) of the vocabulary by looking at how bad the XPath expressions will have to be for the three transformations. For the above case, notice the potential to use xsl:key.

You want important properties of the data to be findable by evaluating a single expression, as opposed to having to set up a recursive search loop. In the above case, this means that I tested the string equality of @ID and $ItemID, rather than having to execute some other search/match process.

Another thing to think about is some of the xsl:for-each loops that might be needed in the three transformations.

Once your XML vocabulary escapes into the wild, others will probably write stylesheets beyond the three you anticipate. I suspect that if you have a good range of possibilities within the three, you will have accommodated many other needs.
.................David Marston
(speaking for himself, not the TC)

===================================================
Toby Considine      ! "Do the right thing. It will
UNC Chapel Hill     ! gratify some people and
Chapel Hill, NC     ! astonish the rest."
Phone (919)962-9073 !
Fax (919)962-1102   !            --Mark Twain
tobias@fac.unc.edu !
===================================================