relax-ng message

Subject: Re: Alternative approach for key/keyref

From: James Clark <jjc@jclark.com>
To: Murata Makoto <mura034@attglobal.net>, relax-ng@lists.oasis-open.org
Date: Wed, 11 Jul 2001 08:28:54 +0700

I think path-expression-based approaches to key/keyRef constraints are very 
useful, and I am glad you have figured out how to ensure consistency 
between such RELAX NG and such constraints at least as far as datatypes are 
concerned.  Am I right in thinking there are other consistency issues apart 
from datatypes?

The question arises of how such path-expression-based approaches fit in 
with RELAX NG.  With a path-expression-based approach, elements and 
attributes can play three roles:

a) They can be the root of a subtree within which some set of keys is unique

b) They can be the target (the common ancestor of the various fields): the 
object that conceptually is identified by the key

c) They can be the individual fields that together identify a particular 
target

Now W3C XML Schema uses paths for (b) and (c) but not for (a).  The root of 
the subtrees are identified by adding <key> and <keyRef> elements to the 
element declarations for the elements that serve as root.  I think this is 
not such a good approach.  It makes it harder for applications to identify 
which elements and attributes are serving as keys: you cannot just give the 
applications the paths; instead they need a PSVI from the key/keyRef 
validator.  Another problem with W3C XML Schema is that they have 
hierachichal keys but don't have hierarchichal references. My gut feeling 
is that supporting hierarchichal references is made harder by not using 
paths for (a).

I believe that a path-based approach should use paths for all of (a), (b) 
and (c).  However, if a key/keyRef constraint languages uses paths for all 
of (a), (b) and (c) there is no need for it to be syntactically integrated 
into the grammar.  The specification of the key/keyRef constraints can be 
syntactically completely separate.  Semantically it can't be completely 
separate because of issues of consistency, and because probably the 
key/keyRef constraints should use the datatypes specified in the grammar.

I think syntactically a key/keyRef based language will likely have a very 
different flavour to RELAX NG.  It is very natural to use a XPath-style 
syntax for paths. XPath has its own way of specifying name classes (*, 
foo:*) which are different from those used by RELAX NG.

This suggest to me that the right approach is to have a "RELAX Keys" schema 
that specifies key/keyRef constraints using paths  and points to or 
includes a RELAX NG grammar (or maybe the binding between a RELAX Keys 
schema and a RELAX NG grammar could be done with RELAX Namespaces).

What does this imply for the key/keyRef facility in RELAX NG at the moment? 
I don't believe we should get rid of it.  I believe RELAX NG needs to have 
built in at least ID/IDREF functionality. I think people can very easily 
migrate from ID/IDERF to our keys; I think the migration from ID/IDREF to a 
path-based approach is much harder for users.  Our approach gains 
significant ease of use by being syntactically integrated in the grammar. 
Section 7.4 looks long and complicated because it's spelling everything out 
formally and rigourously with no hand-waving.  Even if we simplified it to 
be closer to ID/IDREF (by for example not allowing keys to have separate 
symbol spaces), 7.4 would not be much simpler.

There are two things I think we should consider doing:

1. Adding a note explaining that our key/keyRef facility is intended as a 
modest, very easy to use increment on ID/IDREF not as a comprehensive 
solution for key/keyRefs and that more comprehensive solutions can be 
layered on top of RELAX NG.

2. Possibly renaming key/keyRef to something else.  Leave the name 
key/keyRef free for future, more comprehensive path-based approaches. 
Perhaps use a term closer to ID/IDREF.  I think

<id>
  <data type="token"/>
</id>

is quite natural. Using "name" as the name of the attribute indicating the 
ID symbol doesn't seem to fit very well with <id>, <idref>.  Maybe one 
could use

<id class="...">...</id>

Such a "class" attribute should probably be optional.

James

Follow-Ups:
- Re: Alternative approach for key/keyref
  - From: Murata Makoto <mura034@attglobal.net>
- Re: Alternative approach for key/keyref
  - From: Kohsuke KAWAGUCHI <kohsuke.kawaguchi@eng.sun.com>

References:
- Alternative approach for key/keyref
  - From: Murata Makoto <mura034@attglobal.net>