[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: ID/IDREF problem
I am glad you raised the issue of ID and IDREF. It's part of the broader issue of what to do about uniquess and cross reference constraints (which XML Schema Part 1 calls "dentity constraints"). I don't have a solution to propose, but let me offer some random thoughts. ID is very important from a XML/SGML legacy perspective. IDREF(S) is slightly less important. In the context of the WWW, cross references more often use URI references than IDREF(S). Apart from the legacy issue, ID/IDREF is a very weak way to do identity constraints. They don't support: scoping ID/IDREFs to a particular element; multipart keys; constraining IDREFs to point to the right kind of element; multiple distinct symbol spaces of IDs; having the identity of an element be specified by a child. With ID, for many applications it is not sufficient that an attribute so labelled uniquely identify an element. The application also needs to know that the attribute is serving as an ID. For many applications, this means that the attribute has to be declared as an ID in the DTD. For example, if you want to use id() in XSLT or XPath, or use #name if XPointer, then you need attributes declared as IDs in the DTD. Declaring it as an ID in a TREX pattern isn't sufficient. This suggests the need for a datatype which checks whether the type of an attribute in the infoset is ID. In the XML Schema Datatypes CR, the uniqueness/cross references constraints are part of the semantics of the ID/IDREF datatypes. This doesn't really fit with XML Schema Datatypes' conceptual framework of what a datatype is. However, in the PR these constraints are moving into Part 1 (see http://www.w3.org/2000/12/xmlschema-crcomments.html#ID-value-space). As Kawaguchi-san points out, you need some special constraints on how ID/IDREF are used to make validation of ID/IDREF workable. (RELAX includes such constraints.) It is possible to adapt my proposed algorithm for datatype assignment in TREX to provide such constraints for TREX. (When used for this, you would only consider ID/IDREF datatypes when determining whether a set of character elements is ambiguous.) A problem in having special constraints for the use of ID/IDREF datatypes in TREX is that it is undesirable (in my view) to make TREX depend on XML Schema Part 2. One possibility would be to introduce into TREX itself support for ID/IDREF (eg additional <id/>, <idref/> and <idrefs/> patterns or an additional attribute on the <attribute/> element), along with appropriate constraints (such as in 9) to enable easy validation. On the other hand I find it rather distasteful to clutter the core TREX language with support for something that should in my opinion be viewed as a legacy construct. In addition to ID/IDREF, XML Schemas Part 1 has a new facility for identity-constraints (using the xsd:unique, xsd:key, xsd:keyref, xsd:selector, xsd:field elements). See http://www.w3.org/TR/xmlschema-1/#Identity-constraint_Definition_details). This is quite a good facility. The only thing I think it lacks is proper support for hierarchical references. On the other hand the elements supporting identity-constraints have the feel of a separate language, which could easily be layered on top of the rest of Schema Part 1. We could support these XML Schema Part 1 elements for identity constraints in TREX by allowing <element> elements to have arbitrary elements from other namespaces with a global attribute trex:role="constraint". Just like you can plug in a datatyping language to TREX, so you would be able to plug in a identity-constraint language. However, there is some awkwardness in combining the XML Schema part 1 syntax for identity constraints with the TREX pattern syntax. For example, TREX patterns have one way to specify classes of names; XML Schema Part 1 (via XPath) has a completely separate way. It would be possible to come up with a syntax for identity constraints that was much more harmonious with TREX than the XML Schema part 1 syntax. It would also be possible to handle identity constraints as a completely separate language. In addition to have a TREX pattern specifying the structure, you would also have an identity constraint specification. This would be much more convenient if there were some nice way to package the two together. There's an interaction between identity-constraints and datatyping. You may want to do comparison of keys according to the datatypes of the keys rather than just lexically comparing strings. James Kohsuke KAWAGUCHI wrote: > > I'd like to bring up the problem of ID and IDREF. ID and IDREF are very > problematic for validating processor. > > Is there any plan to add some constraint to TREX to avoid this problem? > Or should processors be capable of handling this? > > Consider the following grammar. > > <element name="root"> > <choice> > <!-- foo of ID/IDREF --> > <element name="foo"> > <attribute name="id" type="ID" /> > <attribute name="ref" type="IDREF" /> > </element> > > <!-- foo of string/string pair --> > <element name="foo"> > <attribute name="id" type="string" /> > <attribute name="ref" type="string" /> > </element> > </choice> > </element> > > How can you decide that which "foo" is applicable? You have to decide, > otherwise IDREF cannot be validated. > > Example instance: > > <root> > <foo id="x" ref="notDefinedSoThisMustBeString" /> > <foo id="x" ref="y" /> > <foo id="y" ref="x" /> > </root> > > This problem (ID/IDREF problem) can be polynomial-reducible from the > satisfiability problem of boolean logic, which is NP-complete. So in > worst case, validation takes exponential time to the size of the > instance. > > regards, > ---------------------- > K.Kawaguchi > E-Mail: k-kawa@bigfoot.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC