opendocument-users message

Subject: RE: office-comment text:id vs xml:id (ODF 1.2CD01)
From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
To: "'Alex Brown'" <alexb@griffinbrown.co.uk>,"ODF Users List" <opendocument-users@lists.oasis-open.org>
Date: Sat, 7 Mar 2009 09:29:02 -0800
Alex,

I think the situation with xml:id is much messier than first appears.

There are at least two things going on here, as far as I can tell, which means the situation may be over-constrained beyond all resolution.

 1. There are pre-existing ODF attributes of ID type that there is an effort to deprecate in favor of xml:id instead, some of the time.

 2. We don't know, from the ODF specification, what possible references to the existing ones, as IDs, are actually defined for ODF, whether via URIs (and fragment identifiers) or other means that are ODF-unique (though sometimes via an IDREF or IDREFS).

 3. It would be nice to be able to coin xml:id values on the assumption that there are no other attributes of type ID that might present conflicts.  (Of course, if you don't know what is going on, intentionally duplicating the value of some *:id attribute value for an xml:id attribute on the same element is craziness.  Dealing with *:name attributes is out of the question, especially since most of them are of type string and they are not always instances of identifier assignment or of identifier reference.)

 4. Oh, and we don't know which of the pre-existing cases are meant to be used in fragment identifiers for URI reference as part of ODF-specific provisions, but that is mainly the whole point in moving toward an use of xml:id (i.e., for RDF-carried URIs and other cases that may arise).

 - Dennis

This message is my personal observation and any similarity to an official position of the ODF TC or of OASIS is purely coincidental.  Were there such an official position, there'd be provision of a link to the official minutes or other approved document where the official position is expressed.

Dennis E. Hamilton
------------------
NuovoDoc: Design for Document System Interoperability 
mailto:Dennis.Hamilton@acm.org | gsm:+1-206.779.9430 
http://NuovoDoc.com http://ODMA.info/dev/ http://nfoWorks.org  

ADDITIONAL OBSERVATIONS

 1. ODF creates a set of not-exactly-ID type values for purposes of its document model.  This is the sort of thing that is respected by the xml:id specification.  This is also something that seems to me as best done without using ID, IDREF, and IDFREFS as well as disguised forms of those if one is also interested in keeping the constrained document-modle cases separate from arbitrary use of fragment identifiers on elements (as is done with the introduction of the RDF Metadata and apparently with some SVG/SMIL bits too).  An important case for these different (non-ID) identifiers is there direct use in connecting material in different XML Document parts of an ODF Document package by references that are not URIs (i.e., they are more akin to IDREF and IDREFs in usage but hopefully not implementation).  The problem that I'm observing is that the comingling of these with (or replacement by) xml:id is not thought out. 

  1.1 The clash we are having is with the half-baked notion that we should just settle on xml:id for all of it.  The insane part is that the ODF specification, now and previously, does not make it clear which occurrences of its pseudo-ID and name types are establishing the identification of an entity, what the domain of uniqueness is, and what are the occurrences that are to be interpreted as references to one of those entities so-identified.  

  1.2 It is my considered assessment that it is not possible to know the intention here, and the only way we will be able to figure it out is to document what some implementations actually do, abstract it usefully, and see if the ODF TC will actually agree that is what is required.  You can see that, absent such analysis, willy-nilly substituting xml:id everywhere is a serious problem.

 2. Just to show what the half-bakedness is, consider this:

   2.1 If you look at ODF 1.1, you will see that @text:id is of type ID.  It is being deprecated (whatever that means) in 1.2 and, as part of changing it to type NCName in ODF 1.2, there is bizarre language around the way its value and the now-required (when and why?) xml:id value are to be in agreement.  How one deprecates text:id in favor of xml:id is not stated in 18.920, but the game is described in 18.814.9 and other *:id deprecations have similar language.  The ultimate strangeness is the sudden promotion of an existing text:id from type NCName to type ID if there is no xml:id on the same element.      

   2.2 Now, although it was recently denied in a remark on the ODF TC list, this smells of a way to make a transition from what was a specific attribute of ID type to substitute use of xml:id, but in a way where the existing (pseudo-) IDFREF and IDREFS will still work down-level and as an upward compatibility.  I have not figured out any other explanation.  There are statements like "Applications that write documents may still write text:id attributes for these elements in addition to xml:id attributes."  (If I implement knowing this rule, so I am going to create a proper xml:id anyhow, why would I ever produce a text:id with the same value, knowing that a down-level processor may choke on the ID-value clash if it also accepts the xml:id?)  

 3. As a Johnny-come-lately to this situation, I have a different view.

  3.1 I believe that, insofar as there is no intention that the *:id and *:name and similar attributes carried over from ODF 1.1 have an ODF-defined use as fragment identifiers, they should all be changed to NCNames (although some are of string type in ODF 1.1) and the following done: If there are no defined attribute values and elements that give rise to references to those thingies, they should simply be deprecated along with specification that they be ignored (without knowing why they are there, in this case, it is hard to ask that they be removed from the specification entirely, but that is an even better idea).  If there are defined attribute values and elements that constitute references, they should also be adjusted to no longer be treated as IDREF and IDREFS cases but as NCName and NCNames cases with the association between reference and identified entity clearly established in the ODF specification itself, with all of the still-needed explanation on agreement between the nature of the referred-to and the referred-from and the relevant constraints that are currently left to the imagination. 

  3.2 Such adjustment would free up the independent use of xml:id as a fragment identifier for any in-ODF or extra-ODF reference to those identified entities by URIs.  In this case, the obvious rule is to say that xml:id values should be preserved so long as the associated element is preserved, simply because the referential-integrity constraints are potentially unknown to the ODF processor.  This would allow accretions such as the RDF Metadata additions of 1.2 to be implemented without having to fight over who owns the particular xml:id and also with confidence that no attribute but xml:id introduces ID type values into the domain of required-to-be-unique ID values.  (That is, one does not have to know the schema or the specification to insert an xml:id, to refer to it with a URI, and to preserve it so long as the corresponding element is untouched.  This seems to be as good as it gets with ID type assignment and value uniqueness conditions without also having document-model constraints too.)

  3.3 This separation of concerns may be impossible (technically, not just ideologically), since there may well be fragment-naming URIs that already need to be supported.  Either way, we need to figure that out.  That or just put our heads on our desks and weep.



-----Original Message-----
From: Alex Brown [mailto:alexb@griffinbrown.co.uk] 
http://lists.oasis-open.org/archives/office-comment/200903/msg00083.html
Sent: Saturday, March 07, 2009 01:54
To: office-comment@lists.oasis-open.org
Subject: [office-comment] text:id vs xml:id (ODF 1.2CD01)

Dear all,

The <text:p> element declares optional attributes @text:id and @xml:id.

We are told (18.814) that @text:id "specifies an ID or a name for an element". In what sense can this ever specify an "ID", since in the schema the attribute is declared to be of type NCName?

The @xml:id attribute *is*, however, declared to be of type ID. We are told (18.920) that xml:id "gives an element an unique identification in its XML file".

It is absurd to attempt to have conflicting identification systems declared on an element, and it is a fundamental precept of XML that elements have only one identifier.

Consider a document containing the following two <text:p> structures

<text:p xml:id="a" text:id="b"/> 

<text:p xml:id="b" text:id="a"/>

What is the result of applying the DOM GetElementsById() method to this document with a param of "a" ?

What is the result of resolving a URI to this document with a fragment identifier of "#a" ?

PROPOSAL. xml:id should be sole, standard, way of assigning (XML) identifiers to elements. All other mechanisms need to be removed.

- Alex.
Follow-Ups:
- RE: office-comment text:id vs xml:id (ODF 1.2CD01)
  - From: "Alex Brown" <alexb@griffinbrown.co.uk>