office message

Subject: ODF 1.2: Simplifying and generalizing xml:id for RDF and other connections

From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
To: "ODF TC List" <office@lists.oasis-open.org>
Date: Sun, 14 Dec 2008 22:29:20 -0800

I notice that the introduction of xml:id is becoming very messy in ODF 1.2, leading to some complex conditional usages in conjunction with other attributes of type ID, IDREF, and IDREFS. This is a matter of concern because xml:id was not a consideration in ODF 1.0/1.1 but there are other attributes of ID (and IDREF and, potentially, IDREFS) type.

I can also see why xml:id is very valuable in the Metadata proposal and in other proposals that may require reference to ODF elements via URI fragment syntax.

At the same time, introducing xml:id element by element is creating a complicated nest of details. This is especially apparent in the metadata proposal where xml:id is asserted on specific ODF sub-file elements in conjunction with the intention to refer to those particular elements in document metadata.

It strikes me that there is a simpler way to do this. I am wondering if this will solve a variety of concerns that exist around use of xml:id and the redundant and disparate use of :id and :name attributes in ODF 1.0/1.1/1.2. I have not seen this discussed and I confess ignorance to earlier conversations that may have looked at this alternative. Here is a sketch of my thinking:

SKETCH

1. Add xml:id, of type ID, as an optional attribute on ALL elements defined in the ODF 1.2 schema.

2. The principle is that

* This be the only attribute of type ID defined in the ODF Schema (bear with me, I am not going to break anything).
* The xml:id is for arbitrary reference as an XML-document fragment identifier and is available for use as a URI fragment identifier for any purpose, including via other material included a package, via RDF URIs in the metadata extension, and so on.
* There will be no reference to xml:ids of XML-document elements using IDREF or IDREFS elements.
* xml:id values shall/must be preserved unchanged when they are present in an XML-document that satisfies the ID-processing requirements of the xml:id 1.0 specification.
* any use of xml:id can deal with xml:id elements and uniqueness requirements for type ID without concern for the types of any other attributes encountered in an XML document that occurs as an ODF sub-document (including free-standing office:document element). It is not necessary to validate the schema or be aware of the schema simply to affix and to access elements by the xml:id attributes that are present. (This is one of the purposes for introducing xml:id.)

3. The saving play is by also doing the following:
* For every current ODF schema attribute (except xml:id), change ones of type ID to be of type NCName.
* For every ODF schema attribute that is of type IDREF, change the type to NCName (or to a derived NCNameRef type if desired).
* For every ODF schema attribute that is of type IDREFS, change the type to NCNames (or to a derived NCNameRefs type).
* The language around those existing attributes needs to avoid use of "ID" and the other type names in the text, perhaps calling them element identifiers and identifier references or something more palatable but not involving the common type names ID, IDREF, and IDREFS.
* In this way, there is no deprecation of the usage in ODF 1.0/1.1, the semantics are the same, and there is no complication with having xml:id showing up along-side those attributes starting with ODF 1.2.

4. THE CHANGE (3) WILL STILL VALIDATE THE SAME DOCUMENTS USING THE ODF 1.0 and 1.1 attributes of types ID, IDREF, and IDREFS. And the semantics are preserved because those uses are all constrained to be references among particular elements which are constrained by semantic-consistency conditions. So nothing has been lost, and the introduction of xml:id with its special requirements and need for global uniqueness across an entire XML document is simply moved to the side for arbitrary use in adding xml:id as a fragment identifier with no concern for ODF document semantics.

5. It seems to me that this is the one chance we have to allow xml:id cleanly without having to go through special contortions around having multiple attribute of type ID on the same element.

- Dennis

ADDITIONAL CONSIDERATIONS

6. I am unable to resolve the URL included in the ODF 1.2 draft for the xml:id specification. It seems that the only W3C Recommendation is for xml:id 1.0 of 2005-09-09 and is at http://www.w3.org/TR/2005/REC-xml-id-20050909/ and this is the latest version.

7. It is important to understand that there can be no modification or extension of the xml:id specification as part of allowance in an application of XML documents, such as ODF. xml:id (as for all attributes beginning with "xml") is not available to modify or qualify.

8. Constraints that should be of concern for the ODF usage of attributes with specialized :id and :name are that
- the values of all attributes of type "ID" (which includes all xml:id attributes) within a document are unique
- each element has at most one single unique identifier
I believe the most straight-forward way to ensure adherence to these constraints is to have only xml:id be of type ID.

9. I also believe it is far more appropriate to avoid using xml:id for those identifiers which are part of the ODF-specific structural semantics for coordinating elements. It works better if xml:id is more primitive than that and that ODFREF be avoided in the specialized structural cross-referencing.

10. It is also valuable to allow, in this way, individual domains whose value sets have the lexical form of NCName type to be independent, since they are naturally constrained by their naming and ODF-specific function. (These could even be made distinct types, e.g., text-id and draw-id, but that seems to be overkill.) This proposal preserves the possibility that implementations might have taken advantage of uniqueness relative to a particular type of element being identified.

11. There is nothing to deprecate from ODF 1.0 and 1.1 with this approach. There might be a desire to deprecate some identifier attributes that appear to be redundant and under-utilized (e.g., draw:id), but that can be an independent decision now, having nothing to do with xml:id occurrences in ODF documents.

12. There are some uses of ID-like attributes that are defined to be of type string. It may be valuable to deprecate the string case and use NCName. That would be a deprecation and an approach for transition would be required. It might be valuable for greater consistency in 1.2 and moving to the future.

13. There are some immediate concerns for the syntax of XML Names in xml:id ID type and other NCName type values. It appears that NCNames and xml:id must now allow the full range of Unicode "letters" and limited special characters of XML 1.0 (edition 5). It is not necessary to bite that bullet if the NCName syntax is taken from an earlier specific XML 1.0 edition that is otherwise normative for ODF 1.0/1.1/1.2. This needs to be pondered a little, for example, the definition of letter for NCNameStartChar in XML 1.0 (Second Edition) now refers to XML 1.0 (Fifth Edition) for its syntax. This is something that all ODF processors may need to pay attention to (since UTF-16 surrogate pairs are allowed to cover all of the Unicode code points now admissible). See also, http://www.w3.org/TR/2008/REC-xml-20081126/#sec-suggested-names

14. When <office:document> appears as an element in the interior of an ODF root element (e.g., as the content element of a <draw:object>), the xml:id uniqueness rules extend to it as an element of a larger single XML document. That need not be the case, and it is desirably not the case, for the structural cross-references within that <office:document> element, and that is a good thing, it seems to me. That is, there should be no tacit connection with styles, other referenced things in the contained <office:document> and the containing root element. If we don't want it that way, it probably needs to be pointed out in very large type.

15. I don't know how (14) works for external entities that happen to be in the form of XML documents/elements. Likewise when a free-standing <office-document> file is accessed by reference from another XML document (again, say via a <draw:object> element). That needs to be figured out though. Especially if RDF of the metadata extension sort comes along.

- Dennis

Dennis E. Hamilton
------------------
NuovoDoc: Design for Document System Interoperability
mailto:Dennis.Hamilton@acm.org | gsm:+1-206.779.9430
http://NuovoDoc.com http://ODMA.info/dev/ http://nfoWorks.org