office message

Subject: The desirability of xml:id stability
From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
To: "ODF TC List " <office@lists.oasis-open.org>
Date: Mon, 4 Feb 2013 09:00:48 -0800
In today's call, there was interesting discussion about producers preserving
xml:id attributes on elements that are preserved from a document that is
being consumed.  This is in reference to the proposal of OFFICE-3788: 
<https://tools.oasis-open.org/issues/browse/OFFICE-3788>.

I believe that is a valuable feature for complex document cases, but that it
is not a good idea for a .x release of the ODF specification.  The ODF
1.0/1.1/1.2 line does not require any such preservation.  There is also
nothing to prevent an implementation from doing it.  So there is room for
implementations to determine whether it is important for their use cases.
There might be guidance about that, but I don't believe there should be any
requirement about it.  Absent implementation differentiation becoming a
factor in interoperability, it is perhaps not a good idea to suddenly impose
this requirement on implementations.

It is not clear that the benefit is such that all implementations would be
required to preserve xml:id attribute ID values so long as the element
having the xml:id occurrence persists.  As desirable as this might be from a
puristic position, it does damage to many implementations that have never
found an use case sufficient to implement this already-allowed capability.

For calibration and added perspective, here are three use cases for the
preservation of xml:ids. All have problems.  These are all for preserving
xml:ids for referential integrity of references from outside the document
that refer to internal elements of a document (derivative).  Accomodating
any of them in ODF 1.3 might be a bridge too far.


CASE 1: [X]HTML Production.  

When a document is saved as HTML, the xml:ids are presumably turned into
identified anchors.  This is necessary simply to allow for internal
cross-references by IDREF attribute values that target an xml:id ID value.  

Changing those ID and IDREF values on editing of a replacement for an
existing HTML document will break any deep links into the updated HTML
export from anywhere else in the World Wide Web.  That may not be acceptable
for some usage of ODF implementations as tools for maintaining and producing
an HTML rendition.  (The same problem arises for user-created bookmarks and
the identifiers that are generated for them and cross-references to them.)

CASE 2: RDF in the same package and elsewhere.  (Not just the RDFa in
content.xml itself)

ODF 1.2 permits RDF parts to be included in a document that refer into
elements of the document structure.  These RDF parts need a way to identify
the elements being referenced, and fragment IDs in URIs of the RDF terms are
the common means.  

Likewise, when the RDF is extracted from the document (e.g., via a GRDDL
procedure) or is otherwise external from a document, that RDF can make use
of the ODF Package and OWL Document OWL classes to continue to refer to
specific elements internal to the ODF package.  To the extent that a
revision of the document is logically the same work with respect to the
nature of the RDF about it, not preserving fragment IDs becomes a problem.
(It is also a challenge to deal with the fact that ODF currently lacks a
means for creating a location-independent entity identification of a
document.  Something is needed for where different occurrences of instances
are to be taken as logically the same document.  This requires something
that can work as a persistent URI or URN for a document that is relatively
instance-independent and where the document is not necessarily found only at
a unique URL location on the Web.)

Finally, it is not to be expected that all implementations will be in a
position to adjust RDF within packages to align with changed xml:id ID
values in order to perserve the referential integrity from such metadata.
Some implementations will simply not deal with such RDF and they may but
need not preserve that RDF within the package.  (There are pros and cons
about this.  Having mystery material can be a problem for document
safety/security and also for documents that are digitally signed when there
is implementation-unknown material.) 

ODF 1.2 doesn't constrain this and it is difficult to see what ODF 1.3 can
do beyond adding some guidance. It is perhaps better for guidance to be
worked out and demonstrated at OIC first.  That's certainly the case for RDF
that is not in the package at all.


CASE 3: ODF 1.2 CHANGE TRACKING

Depending on how references to portions of documents involving tracked
changes happens, there can be a problem with the preservation of xml:id
attributes.  

In ODF 1.0/1.1/1.2 the connection of change information with the places in
the document where the change applies is accomplished by the xml:id ID value
on a <text:changed-region> element.  It is also the case that element start
tags with xml:id attributes can be swept up into <text:deletion> elements
that carry removed material.  Those xml:ids would need to be preserved,
since the deletion can be rejected in a later edit.  (This situation has
remarkable consequences for RDF now referencing an element that is
(partially) deleted.) 

I don't know whether this is comprehended as an edge case for the MCT-based
change-tracking for ODF 1.3.

AND EDGE CASES  

There are many edge cases to all of this.  There is the interaction with
change-tracking (and whether that can synchronize with arbitrary RDF in the
package), accessibility (also impacted by change tracking), and probably
other provisions, including concerns about covert content and digital
signatures.  

It is also important to note that the xml:id attribute ID values in ODF 1.2
documents are generally not thought to be user-specifiable.  Where there are
user-specified names, these are in other attributes that are usually not
used as attribute values of type ID and IDREF.  (Note that this xml:id case
should actually be about all ODF 1.x attributes having values of type ID,
since uniqueness must be preserved across all of them.  The xml:id ones are
the only ones automatically accessible via fragment values in URI
references.)
 
 - Dennis

PS: Another cat picture:
<http://www.flickr.com/photos/orcmid/1502722674/in/set-72157600230263578>.
Follow-Ups:
- Re: [office] The desirability of xml:id stability
  - From: Patrick Durusau <patrick@durusau.net>
- Re: [office] The desirability of xml:id stability
  - From: Michael Stahl <mstahl@redhat.com>