opendocument-users message

Subject: RE: office-comment text:id vs xml:id (ODF 1.2CD01)
From: "Alex Brown" <alexb@griffinbrown.co.uk>
To: <dennis.hamilton@acm.org>,"ODF Users List" <opendocument-users@lists.oasis-open.org>
Date: Sun, 8 Mar 2009 16:30:21 -0000
Dennis hi

> -----Original Message-----
> From: Dennis E. Hamilton [mailto:dennis.hamilton@acm.org]
> Sent: 07 March 2009 17:29
> To: Alex Brown; ODF Users List
> Cc: Makoto MURATA; 'marbux'
> Subject: RE: office-comment text:id vs xml:id (ODF 1.2CD01)
> 
> Alex,
> 
> I think the situation with xml:id is much messier than first appears.

Oh my ...

> There are at least two things going on here, as far as I can tell,
> which means the situation may be over-constrained beyond all
> resolution.
> 
>  1. There are pre-existing ODF attributes of ID type that there is an
> effort to deprecate in favor of xml:id instead, some of the time.

Ah, so that explains the convoluted wording.

Trying to run two ID schemes is parallel is just crazy, IMHO.
 
>  2. We don't know, from the ODF specification, what possible references
> to the existing ones, as IDs, are actually defined for ODF, whether via
> URIs (and fragment identifiers) or other means that are ODF-unique
> (though sometimes via an IDREF or IDREFS).

No, the poor naming conventions, attribute overloading and vague documentation make it very hard to work out what is going on.
 
>  3. It would be nice to be able to coin xml:id values on the assumption
> that there are no other attributes of type ID that might present
> conflicts.

Yes. I'm not sure we need to think this is "nice" though, isn't it just XML schema design 101?

>  (Of course, if you don't know what is going on,
> intentionally duplicating the value of some *:id attribute value for an
> xml:id attribute on the same element is craziness. 

+1

> Dealing with *:name
> attributes is out of the question, especially since most of them are of
> type string and they are not always instances of identifier assignment
> or of identifier reference.)

Of course if elements need labels, title or "alt" text this is another question. But those things are not (and should not be) ids, let alone IDs.
 
>  4. Oh, and we don't know which of the pre-existing cases are meant to
> be used in fragment identifiers for URI reference as part of ODF-
> specific provisions, but that is mainly the whole point in moving
> toward an use of xml:id (i.e., for RDF-carried URIs and other cases
> that may arise).

Yes - and the point of the standard should be to embody good design and explain it clearly. The opposite of what's going on with this stuff at the moment.
 
>  - Dennis
> 
> This message is my personal observation and any similarity to an
> official position of the ODF TC or of OASIS is purely coincidental.
> Were there such an official position, there'd be provision of a link to
> the official minutes or other approved document where the official
> position is expressed.

Understood!


> Dennis E. Hamilton
> ------------------
> NuovoDoc: Design for Document System Interoperability
> mailto:Dennis.Hamilton@acm.org | gsm:+1-206.779.9430
> http://NuovoDoc.com http://ODMA.info/dev/ http://nfoWorks.org
> 
> ADDITIONAL OBSERVATIONS
> 
>  1. ODF creates a set of not-exactly-ID type values for purposes of its
> document model.  This is the sort of thing that is respected by the
> xml:id specification.  This is also something that seems to me as best
> done without using ID, IDREF, and IDFREFS as well as disguised forms of
> those if one is also interested in keeping the constrained document-
> modle cases separate from arbitrary use of fragment identifiers on
> elements (as is done with the introduction of the RDF Metadata and
> apparently with some SVG/SMIL bits too).  An important case for these
> different (non-ID) identifiers is there direct use in connecting
> material in different XML Document parts of an ODF Document package by
> references that are not URIs (i.e., they are more akin to IDREF and
> IDREFs in usage but hopefully not implementation).  The problem that
> I'm observing is that the comingling of these with (or replacement by)
> xml:id is not thought out.

Right, the crappy old stuff needs to be ripped out and replaced with a working xml:id based scheme IMO.

(Or maybe there needs to be an "ODF transitional" where the bad practice stuff can be quarantined; if this kind of stuff comes to JTC 1 that's its likely fate, I supect.)
 
>   1.1 The clash we are having is with the half-baked notion that we
> should just settle on xml:id for all of it.  The insane part is that
> the ODF specification, now and previously, does not make it clear which
> occurrences of its pseudo-ID and name types are establishing the
> identification of an entity, what the domain of uniqueness is, and what
> are the occurrences that are to be interpreted as references to one of
> those entities so-identified.

This obviously needs to be sorted-out. But that's on the TC's TODO list, right?
 
>   1.2 It is my considered assessment that it is not possible to know
> the intention here, and the only way we will be able to figure it out
> is to document what some implementations actually do, abstract it
> usefully, and see if the ODF TC will actually agree that is what is
> required.  You can see that, absent such analysis, willy-nilly
> substituting xml:id everywhere is a serious problem.

That might be an approach ... it assumes there is a workable system lurking in the past. My inclination would be for a clean break here.
 
>  2. Just to show what the half-bakedness is, consider this:
> 
>    2.1 If you look at ODF 1.1, you will see that @text:id is of type
> ID.  It is being deprecated (whatever that means)

Is that mentioned anywhere?

> in 1.2 and, as part
> of changing it to type NCName in ODF 1.2, there is bizarre language
> around the way its value and the now-required (when and why?) xml:id
> value are to be in agreement.  How one deprecates text:id in favor of
> xml:id is not stated in 18.920, but the game is described in 18.814.9
> and other *:id deprecations have similar language.  The ultimate
> strangeness is the sudden promotion of an existing text:id from type
> NCName to type ID if there is no xml:id on the same element.

Strange indeed.

>    2.2 Now, although it was recently denied in a remark on the ODF TC
> list, this smells of a way to make a transition from what was a
> specific attribute of ID type to substitute use of xml:id, but in a way
> where the existing (pseudo-) IDFREF and IDREFS will still work down-
> level and as an upward compatibility.  I have not figured out any other
> explanation.

This is, I think, partly tied up in the faulty ID declarations in the schema (pre MB's very latest revision). I believe it was I who first suggested using xml:id as a way of getting away from the problematic RELAX NG DTD Compatibility spec.

Using xml:id is a fine idea, still. 

It's just not been done very well.

>  There are statements like "Applications that write
> documents may still write text:id attributes for these elements in
> addition to xml:id attributes."  (If I implement knowing this rule, so
> I am going to create a proper xml:id anyhow, why would I ever produce a
> text:id with the same value, knowing that a down-level processor may
> choke on the ID-value clash if it also accepts the xml:id?)

That kind of nonsense needs to come out of ODF.
 
>  3. As a Johnny-come-lately to this situation, I have a different view.
> 
>   3.1 I believe that, insofar as there is no intention that the *:id
> and *:name and similar attributes carried over from ODF 1.1 have an
> ODF-defined use as fragment identifiers, they should all be changed to
> NCNames (although some are of string type in ODF 1.1) and the following
> done: If there are no defined attribute values and elements that give
> rise to references to those thingies, they should simply be deprecated
> along with specification that they be ignored (without knowing why they
> are there, in this case, it is hard to ask that they be removed from
> the specification entirely, but that is an even better idea).  If there
> are defined attribute values and elements that constitute references,
> they should also be adjusted to no longer be treated as IDREF and
> IDREFS cases but as NCName and NCNames cases with the association
> between reference and identified entity clearly established in the ODF
> specification itself, with all of the still-needed explanation on
> agreement between the nature of the referred-to and the referred-from
> and the relevant constraints that are currently left to the
> imagination.

My view is that the schema should be re-written based on a clear understanding of what are things being identified, and what are references to them. We've lost forwards compatibility from previous ODF versions anyway so might as well take the opportunity to go for it.

I wouldn't use IDREF(S) anywhere, but URI references everywhere, in the modern style.
 
>   3.2 Such adjustment would free up the independent use of xml:id as a
> fragment identifier for any in-ODF or extra-ODF reference to those
> identified entities by URIs.  In this case, the obvious rule is to say
> that xml:id values should be preserved so long as the associated
> element is preserved, simply because the referential-integrity
> constraints are potentially unknown to the ODF processor.

+1

>  This would
> allow accretions such as the RDF Metadata additions of 1.2 to be
> implemented without having to fight over who owns the particular xml:id
> and also with confidence that no attribute but xml:id introduces ID
> type values into the domain of required-to-be-unique ID values.

The RDF stuff might as well not be there at the moment, as implementations are free to dump all the IDs the RDF would use.

>  (That
> is, one does not have to know the schema or the specification to insert
> an xml:id, to refer to it with a URI, and to preserve it so long as the
> corresponding element is untouched.  This seems to be as good as it
> gets with ID type assignment and value uniqueness conditions without
> also having document-model constraints too.)
> 
>   3.3 This separation of concerns may be impossible (technically, not
> just ideologically), since there may well be fragment-naming URIs that
> already need to be supported.  Either way, we need to figure that out.
> That or just put our heads on our desks and weep.

I'm looking forward to this stuff getting sorted out!

- Alex.
Follow-Ups:
- RE: [opendocument-users] RE: office-comment text:id vs xml:id (ODF 1.2CD01)
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
References:
- RE: office-comment text:id vs xml:id (ODF 1.2CD01)
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>