opendocument-users message

Subject: RE: [opendocument-users] RE: office-comment text:id vs xml:id (ODF 1.2CD01)

From: rjelliffe@allette.com.au
To: "'ODF Users List'" <opendocument-users@lists.oasis-open.org>
Date: Mon, 9 Mar 2009 15:15:45 +1100 (EST)

> 4. I don't think this will help us with xml:id and with the problems of
> not knowing what the referential integrity conditions of ODF are because
> we don't know what the references are under the existing schemas.  But
> namespace versioning, as it were, might help us when we see what hole it
> is we are digging out of.  I suppose that is another way to come at the
> transition issue.  (I'm still thinking that will be a 2.0-level
> disruption.)

With xml:id (and IDs in general) the problem is that there are three
different systems at work.

The first is IDs as used by linking applications: they typically make a
list (index) of name/locator pairs to allow fast unique keyed access of
elements. The second is IDs as used by traditional validators (DTDs, RELAX
NG, etc): they work by checking for duplicates, so that a document
reaching the application won't have duplicates (and therefore ambiguity)
to cope with. The third is type assignment, such as XSD and object APIs:
the element object gets a property.

The trouble with the type assignment approach is that in XML Schemas (XSD)
datatypes an ID (xs:ID) does not have any notion of uniqueness. See
http://www.w3.org/TR/xmlschema-2/#ID  This is because the primitive XSD
types are atomic: they are not checked with any reference to other
elements.

(In XSD, the uniqueness checking happens during validation using the
structures specification, see http://www.w3.org/TR/xmlschema-1/#sic-id and
the ID/IDREF table is part of the Post Schema Validation Infoset (PSVI).
Uniqueness is a function of schema processing, not intrinsic to the
datatype xs:ID.)

In XSD it was recognized that the ID/IDREF mechanism was a too blunt
instrument: the issue of an element having two IDs (e.g., one for one
application, another for another. Or when transitioning from one attribute
name to another.) So it introduced a key/keyref mechanism (which is not
based on types but names.)  In the OASIS/ISO RELAX NG family (ISO DSDL),
there may be a more direct equivalent of key/keyref developed in the
future, but ISO Schematron can already handle these kinds of constraints.

I think it is a mistake to get too hung up on expecting some magic from
using xs:ID rather than xs:NCName (the type from which xs:ID is derived.)
Indeed, there is *no* difference in the lexical or value spaces of xs:ID
from xs:NCName.

When real-life (i.e. document evolution) means that you have to go beyond
what ID provides, then you really need to move either to Schematron or XSD
key/keyref.

For example, consider these possibilities:

* An element can have both an @name attribute and an @id attribute but
both must have the same value and be unique: this is what (X)HTML did.

* An element can have both a @name attribute and an @xml:id attribute but
both must have different unique values.

* An element can have both a @name attribute and an @xml:id attribute but
the @name values must be unique and the @xml:id values must be unique:
they form different key/keyref systems (this is what XSD supports).

HTML is a good example to think about. They used @name at first, then
moved over to @id, but still allowing @name. Very sensible.

The point is that RELAX NG, XML Schemas and DTDs all only provide very
simple ID systems. It should not be a surprise to anyone that they run out
of steam very early. They are not designed to be anything other than 80/20
solutions.

Very simple Schematron schemas can allow declaration and validation of
almost any kind of complex ID/IDREF constraints: a 99/1 solution.

So here are some approaches that the ODF WG could take is to say that an
element with both an old ID attribute and an xml:id should have the same
value in both, and:

1) Use xs:NCName for all IDs. Then make a Schematron schema for the
uniqueness constraints.

2) Use xs:ID for all existing IDs, and xs:NCName for all new attributes
such as xml:ID.  Then make a Schematron schema for the uniqueness
constraints.

3) Use an xs:ID for all xml:IDs, and xs:NCName for all old ids. Then make
a Schematron schema for the uniqueness constraints.

4) Use an xs:ID for all xml:IDs and for existing IDs on other elements,
and xs:NCName for superceded ids. Then make a Schematron schema for the
uniqueness constraints.

I don't know that any of these choices is particularly better than
another. Probably people would be most comfortable with 4), because it
provides the best declarations for a databinding tool might use.

Cheers
Rick Jelliffe

Follow-Ups:
- RE: [opendocument-users] RE: office-comment text:id vs xml:id (ODF 1.2CD01)
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>

References:
- RE: office-comment text:id vs xml:id (ODF 1.2CD01)
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
- RE: office-comment text:id vs xml:id (ODF 1.2CD01)
  - From: "Alex Brown" <alexb@griffinbrown.co.uk>
- RE: [opendocument-users] RE: office-comment text:id vs xml:id (ODF 1.2CD01)
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>