office message

Subject: Re: [office] The desirability of xml:id stability

From: Patrick Durusau <patrick@durusau.net>
To: office@lists.oasis-open.org
Date: Mon, 04 Feb 2013 16:30:58 -0500

Dennis,

On 02/04/2013 12:00 PM, Dennis E. Hamilton wrote:

In today's call, there was interesting discussion about producers preserving
xml:id attributes on elements that are preserved from a document that is
being consumed.  This is in reference to the proposal of OFFICE-3788:
<https://tools.oasis-open.org/issues/browse/OFFICE-3788>.

I believe that is a valuable feature for complex document cases, but that it
is not a good idea for a .x release of the ODF specification.  The ODF
1.0/1.1/1.2 line does not require any such preservation.  There is also
nothing to prevent an implementation from doing it.  So there is room for
implementations to determine whether it is important for their use cases.
There might be guidance about that, but I don't believe there should be any
requirement about it.  Absent implementation differentiation becoming a
factor in interoperability, it is perhaps not a good idea to suddenly impose
this requirement on implementations.

I don't think there is any question of "sudden imposition" of this as arequirement on implementations.

If it bothers you that it might appear in ODF 1.3, we can always changethe number to 2.0.

When new versions of standards appear, software always takes some timeto catch up. That has always been the case with new and exciting features.

Not to mention that the discussion overlooked the fact that ODFapplications currently preserve attribute values for elements, lots ofthem.

I don't have numbers on that so do need to find an archive of realisticODF documents to see how often applications currently save/re-write ids.

It is not clear that the benefit is such that all implementations would be
required to preserve xml:id attribute ID values so long as the element
having the xml:id occurrence persists.  As desirable as this might be from a
puristic position, it does damage to many implementations that have never
found an use case sufficient to implement this already-allowed capability.

How so?

If an implementation choose to not support persistent IDs, it can alwaysbe an ODF 1.2 implementation that implements some extra features definedin ODF 1.3 (or some other numbering).

If "being allowed" were the test for interoperability, there is veryreason to specify values in ODF. Applications could mimic each other ifthey really wanted to. Nothing stops them from doing just that.

For calibration and added perspective, here are three use cases for the
preservation of xml:ids. All have problems.  These are all for preserving
xml:ids for referential integrity of references from outside the document
that refer to internal elements of a document (derivative).  Accomodating
any of them in ODF 1.3 might be a bridge too far.

And the basis for "...might be a bridge too far" is?

I understand that:

1) Present implementation don't preserve xml:ids

2) To require preservation of xml:ids would make existing applicationsnon-conformant to some future version of ODF 1.3. (I assume if we changeanything in ODF 1.3, they are not going to be fully conformant with ODF1.3.)

3) What I am missing is some evidence, other than your saying it, thatpreservation of xml:ids is any harder than preserving any otherattribute value.

I understand implementations don't do the preservation now, but at onetime implementations didn't use XML either. Non-use doesn't mean that aproposal is too difficult or unworkable.

Being able to reliably point into documents would be the next steptowards not simply getting document level pointers from a search engine.

I would rather get a <text:p> level pointer than a point to 800+ pagesof text.


You?

Hope you are having a great day!

Patrick

PS: Having said all that, I welcome pointers to archives of ODFdocuments so I can investigate current usage on ids.


CASE 1: [X]HTML Production.

When a document is saved as HTML, the xml:ids are presumably turned into
identified anchors.  This is necessary simply to allow for internal
cross-references by IDREF attribute values that target an xml:id ID value.

Changing those ID and IDREF values on editing of a replacement for an
existing HTML document will break any deep links into the updated HTML
export from anywhere else in the World Wide Web.  That may not be acceptable
for some usage of ODF implementations as tools for maintaining and producing
an HTML rendition.  (The same problem arises for user-created bookmarks and
the identifiers that are generated for them and cross-references to them.)

CASE 2: RDF in the same package and elsewhere.  (Not just the RDFa in
content.xml itself)

ODF 1.2 permits RDF parts to be included in a document that refer into
elements of the document structure.  These RDF parts need a way to identify
the elements being referenced, and fragment IDs in URIs of the RDF terms are
the common means.

Likewise, when the RDF is extracted from the document (e.g., via a GRDDL
procedure) or is otherwise external from a document, that RDF can make use
of the ODF Package and OWL Document OWL classes to continue to refer to
specific elements internal to the ODF package.  To the extent that a
revision of the document is logically the same work with respect to the
nature of the RDF about it, not preserving fragment IDs becomes a problem.
(It is also a challenge to deal with the fact that ODF currently lacks a
means for creating a location-independent entity identification of a
document.  Something is needed for where different occurrences of instances
are to be taken as logically the same document.  This requires something
that can work as a persistent URI or URN for a document that is relatively
instance-independent and where the document is not necessarily found only at
a unique URL location on the Web.)

Finally, it is not to be expected that all implementations will be in a
position to adjust RDF within packages to align with changed xml:id ID
values in order to perserve the referential integrity from such metadata.
Some implementations will simply not deal with such RDF and they may but
need not preserve that RDF within the package.  (There are pros and cons
about this.  Having mystery material can be a problem for document
safety/security and also for documents that are digitally signed when there
is implementation-unknown material.)

ODF 1.2 doesn't constrain this and it is difficult to see what ODF 1.3 can
do beyond adding some guidance. It is perhaps better for guidance to be
worked out and demonstrated at OIC first.  That's certainly the case for RDF
that is not in the package at all.


CASE 3: ODF 1.2 CHANGE TRACKING

Depending on how references to portions of documents involving tracked
changes happens, there can be a problem with the preservation of xml:id
attributes.

In ODF 1.0/1.1/1.2 the connection of change information with the places in
the document where the change applies is accomplished by the xml:id ID value
on a <text:changed-region> element.  It is also the case that element start
tags with xml:id attributes can be swept up into <text:deletion> elements
that carry removed material.  Those xml:ids would need to be preserved,
since the deletion can be rejected in a later edit.  (This situation has
remarkable consequences for RDF now referencing an element that is
(partially) deleted.)

I don't know whether this is comprehended as an edge case for the MCT-based
change-tracking for ODF 1.3.

AND EDGE CASES

There are many edge cases to all of this.  There is the interaction with
change-tracking (and whether that can synchronize with arbitrary RDF in the
package), accessibility (also impacted by change tracking), and probably
other provisions, including concerns about covert content and digital
signatures.

It is also important to note that the xml:id attribute ID values in ODF 1.2
documents are generally not thought to be user-specifiable.  Where there are
user-specified names, these are in other attributes that are usually not
used as attribute values of type ID and IDREF.  (Note that this xml:id case
should actually be about all ODF 1.x attributes having values of type ID,
since uniqueness must be preserved across all of them.  The xml:id ones are
the only ones automatically accessible via fragment values in URI
references.)

- Dennis


PS: Another cat picture:
<http://www.flickr.com/photos/orcmid/1502722674/in/set-72157600230263578>.



---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


--
Patrick Durusau
patrick@durusau.net
Technical Advisory Board, OASIS (TAB)
Former Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau

Follow-Ups:
- Re: [office] The desirability of xml:id stability
  - From: Andreas J Guelzow <andreas.guelzow@concordia.ab.ca>

References:
- The desirability of xml:id stability
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>