relax-ng message

Subject: Re: ID/IDREF problem
From: James Clark <jjc@jclark.com>
To: trex@lists.oasis-open.org
Date: Thu, 15 Mar 2001 15:07:23 +0700
I am glad you raised the issue of ID and IDREF.  It's part of the
broader issue of what to do about uniquess and cross reference
constraints (which XML Schema Part 1 calls "dentity constraints").  I
don't have a solution to propose, but let me offer some random thoughts.

ID is very important from a XML/SGML legacy perspective.  IDREF(S) is
slightly less important.  In the context of the WWW, cross references
more often use URI references than IDREF(S).

Apart from the legacy issue, ID/IDREF is a very weak way to do identity
constraints. They don't support: scoping ID/IDREFs to a particular
element; multipart keys; constraining IDREFs to point to the right kind
of element; multiple distinct symbol spaces of IDs; having the identity
of an element be specified by a child.

With ID, for many applications it is not sufficient that an attribute so
labelled uniquely identify an element.  The application also needs to
know that the attribute is serving as an ID.  For many applications,
this means that the attribute has to be declared as an ID in the DTD. 
For example, if you want to use id() in XSLT or XPath, or use #name if
XPointer, then you need attributes declared as IDs in the DTD. Declaring
it as an ID in a TREX pattern isn't sufficient. This suggests the need
for a datatype which checks whether the type of an attribute in the
infoset is ID.

In the XML Schema Datatypes CR, the uniqueness/cross references
constraints are part of the semantics of the ID/IDREF datatypes. This
doesn't really fit with XML Schema Datatypes' conceptual framework of
what a datatype is. However, in the PR these constraints are moving into
Part 1 (see
http://www.w3.org/2000/12/xmlschema-crcomments.html#ID-value-space).

As Kawaguchi-san points out, you need some special constraints on how
ID/IDREF are used to make validation of ID/IDREF workable. (RELAX
includes such constraints.) It is possible to adapt my proposed
algorithm for datatype assignment in TREX to provide such constraints
for TREX.  (When used for this, you would only consider ID/IDREF
datatypes when determining whether a set of character elements is
ambiguous.) A problem in having special constraints for the use of
ID/IDREF datatypes in TREX is that it is undesirable (in my view) to
make TREX depend on XML Schema Part 2. One possibility would be to
introduce into TREX itself support for ID/IDREF (eg additional <id/>,
<idref/> and <idrefs/> patterns or an additional attribute on the
<attribute/> element), along with appropriate constraints (such as in 9)
to enable easy validation. On the other hand I find it rather
distasteful to clutter the core TREX language with support for something
that should in my opinion be viewed as a legacy construct.

In addition to ID/IDREF, XML Schemas Part 1 has a new facility for
identity-constraints (using the xsd:unique, xsd:key, xsd:keyref,
xsd:selector, xsd:field elements).  See
http://www.w3.org/TR/xmlschema-1/#Identity-constraint_Definition_details).
This is quite a good facility.  The only thing I think it lacks is
proper support for hierarchical references.  On the other hand the
elements supporting identity-constraints have the feel of a separate
language, which could easily be layered on top of the rest of Schema
Part 1.

We could support these XML Schema Part 1 elements for identity
constraints in TREX by allowing <element> elements to have arbitrary
elements from other namespaces with a global attribute
trex:role="constraint". Just like you can plug in a datatyping language
to TREX, so you would be able to plug in a identity-constraint language.

However, there is some awkwardness in combining the XML Schema part 1
syntax for identity constraints with the TREX pattern syntax.  For
example, TREX patterns have one way to specify classes of names; XML
Schema Part 1 (via XPath) has a completely separate way. It would be
possible to come up with a syntax for identity constraints that was much
more harmonious with TREX than the XML Schema part 1 syntax.

It would also be possible to handle identity constraints as a completely
separate language.  In addition to have a TREX pattern specifying the
structure, you would also have an identity constraint specification. 
This would be much more convenient if there were some nice way to
package the two together.

There's an interaction between identity-constraints and datatyping.  You
may want to do comparison of keys according to the datatypes of the keys
rather than just lexically comparing strings.

James

Kohsuke KAWAGUCHI wrote:
> 
> I'd like to bring up the problem of ID and IDREF. ID and IDREF are very
> problematic for validating processor.
> 
> Is there any plan to add some constraint to TREX to avoid this problem?
> Or should processors be capable of handling this?
> 
> Consider the following grammar.
> 
> <element name="root">
>   <choice>
>     <!-- foo of ID/IDREF -->
>     <element name="foo">
>       <attribute name="id" type="ID" />
>       <attribute name="ref" type="IDREF" />
>     </element>
> 
>     <!-- foo of string/string pair -->
>     <element name="foo">
>       <attribute name="id" type="string" />
>       <attribute name="ref" type="string" />
>     </element>
>   </choice>
> </element>
> 
> How can you decide that which "foo" is applicable? You have to decide,
> otherwise IDREF cannot be validated.
> 
> Example instance:
> 
> <root>
>   <foo id="x" ref="notDefinedSoThisMustBeString" />
>   <foo id="x" ref="y" />
>   <foo id="y" ref="x" />
> </root>
> 
> This problem (ID/IDREF problem) can be polynomial-reducible from the
> satisfiability problem of boolean logic, which is NP-complete. So in
> worst case, validation takes exponential time to the size of the
> instance.
> 
> regards,
> ----------------------
> K.Kawaguchi
> E-Mail: k-kawa@bigfoot.com
Follow-Ups:
- Re: ID/IDREF problem
  - From: Kohsuke KAWAGUCHI <kohsuke.kawaguchi@eng.sun.com>
- Re: ID/IDREF problem
  - From: Eric van der Vlist <vdv@dyomedea.com>