dita message

Subject: [DITA 1.3/2.0] Keyref Not Sufficient for General Element-to-ElementAddress Indirection
From: Eliot Kimber <ekimber@reallysi.com>
To: dita <dita@lists.oasis-open.org>
Date: Mon, 24 Nov 2008 17:02:02 -0600
I've been struggling ever since we started the DITA 1.2 discussion to
clearly articulate my indirect addressing requirements. At the time, I
proposed a general id-to-id indirection mechanism that provided a generic
indirection mechanism. That proposal was set aside in favor if keyref,
partly on the argument that keyref satisfied the indirection requirements. I
felt at the time that it did not but never could quite articulate the
details. 

I now can articulate them. The end of this message contains a potential
solution that I think may be appropriate for proposing in DITA 1.3.

But first an explanation of why I think a solution is needed at all:

The value of indirect addressing is that in the most general case it
provides the following benefits:

1. Protects references from changes in the *location* of the target (that
is, how things are named within a storage or addressing space, e.g., changes
to filenames of XML documents or IDs of element elements).

2. Protects references from changes in the *nature* of the target (that is,
how things are organized as structures or into storage objects, e.g.,
splitting one document into two, combining two documents into one, splitting
one element into two, or combining two elements into one).

Item (1) is about moving or renaming things without breaking references.
Without indirection, any change in the location of a target (including
creating a new version of it) necessarily requires updating *all pointers*
to the thing. Which requires creating new versions of all those things,
which in turn requires updating any pointers to those things, and so. So you
either use indirection or you impose draconian constraints on your ability
to move and rename things or you give up important functionality, like the
ability to explicitly point to older versions of documents.

By using indirection, you can have exactly one location-specific,
version-specific pointer to a thing and then use that as the target for all
other pointers to the thing. When the location of the thing changes, you
only update one pointer, not an unlimited number. The indirection itself has
an invariant address (meaning that pointers to the indirection object never
have to change).

Item 2 is about splitting or combining things without breaking references.
If a thing breaks into two, you update the original indirection object to
point to both things rather than one thing. Likewise, if two things are
combined into one, the indirections for each of the original things can now
point to the new combined object.

The keyref mechanism only provides benefit (1), it does not provide benefit
(2), as explained below. In addition, keyref only provides generalized
indirection for *topics*, not for elements within topics. However, having
the ability to simply move or rename things is tremendously valuable.

The keyref mechanism provides for indirect addressing of *topics* [ignoring
the use of keyref to address elements in <topicmeta>, which is useful but
not relevant to this discussion]. In particular, a reference to a given key
can be redirected to different topics in different map contexts without
modifying the original references to the key itself.

However, this is always a one-to-one indirection: there is no way to
redirect from a reference to a single key to multiple result topics (because
keyref= only takes a single value), which means you can't do something like:

<topicref key="one" keyref="two three"/>
<topicref key="two" href="thing-two.dita"/>
<topicref key="three" href="thing-three.dita"/>

You can map multiple things to a single result by specifying multiple keys
in a keys= attribute, but that has no practical effect in this discussion
because there's no way to have an initial reference to multiple targets.
(That is, you can create multiple aliases for single topic but you cannot
collapse a single reference to multiple topics to a single result topic.)

This is a limitation in that it does not allow for the case where a single
topic is subsequently split into two or more topics that, as a group, should
be the target of any references to the original topic, except by creating a
parent topic that contains both new topics.

This is probably not a severe limitation in practice but it is a limitation
nevertheless.

Note that, for topics, the keyref mechanism provides for name-remapping
because a key-defining topic can itself use keyref to point to another
key-defining topic. This allows you to change the primary name for a topic
without invalidating all existing references to the original name (assuming
you don't then reassign that name to some other topic).

So keyref completely provides value (1) for pointers to topics because you
can map key to both new storage locations (new href= values) and new names
(other keys). However, it does not completely provide value (1) for elements
and does not provide value (2) at all.

By "elements" I mean elements within topics that may be addressed directly.
This includes elements addressed by conref and xrefs, both of which may
address elements within topics.

Unfortunately, the keyref mechanism only partly provides value (1) for
elements.

This is because keyref only lets you specify a sub-topic element ID as part
of a keyref but does not allow key-defining topicrefs to remap the element
ID, only the key itself.

This means that you cannot change element IDs or split or combine elements
without breaking *all references* to those elements regardless of whether or
not you use keyref.

In fact, the use of keyref adds an addition requirement to coordinate the
IDs and structures used within otherwise unrelated topics so that references
to sub-topic elements will continue to resolve.

In the case of controlled localization of topics this is probably not a
problem in practice because localization typically does not allow structural
changes to topics.

But in the non-localization case, for example, providing a new topic as
"plug-in" that serves to override an existing topic in an existing body of
data, the creator of the plug-in topic must ensure that all the *potential*
reference target elements (that is, elements with IDs in the original) are
provided for in the plug-in topic. And this is not just at topic creation
time, but over the life span of both the original and plug-in topic because
the plug-in topic has to react to any changes to the original topic that add
or change element IDs that *might* be link targets.

This seems like a pretty severe limitation and a pretty onerous content
management/author coordination imposition, one that requires either
abandoning the use of element-to-element links except in very controlled
circumstances or the addition of expensive content management practices
and/or tools. It puts the details of information management back into topics
rather than containing it in maps where it belongs.

The solution would be to provide additional indirection mechanisms for
mapping element IDs to elements, in particular, to allow for renaming of
elements.

Thinking about it now, I think this could be done as a relatively simple
extension to the keyref framework that allows a key-defining topicref to
include a map from element IDs to target elements, e.g. Something like:

<topicref key="key-one" href="my-topic-one.dita">
  <topicmeta>
    <elementMap>
     <elementMapItem id="sect-01" target="s-234642q341"/>
     <elementMapItem id="sect-02" keyref="key-two/sect-02"/>
    </elementMap>
  </topicmeta>
</topicref>
<topicref key="key-two" href="my-topic-two.dita">
  <topicmeta>
    <elementMap>
     <elementMapItem id="sect-02" target="s-foo"/>
    </elementMap>
  </topicmeta>
</topicref>


Given the above and the keyref value "key-one/sect-01" the ultimate target
would be the element with the ID 's-234642q341" in the topic
"my-topic-one.dita". The keyref "key-one/sect-02" would resolve to the
element with the ID "s-foo" in the topic "my-topic-two.dita".

This would satisfy the renaming and moving requirement for elements while
still imposing the one-to-one mapping of source to target currently imposed
by keyref (which is consistent with general-one-to-one addressing and
linking constraints already imposed by DITA 1.x).

This design feels to me like a natural extension to the base keyref=
mechanism that doesn't require, as my original proposal did, a completely
separate mechanism for indirecting element ID references.

It would remove the need for topic authors to maintain an id-by-id match to
other topics that use the same key and move the address management work into
the map, where it belongs, rather into topics, which should not have to know
anything about other topics.

Cheers,

Eliot

----
Eliot Kimber | Senior Solutions Architect | Really Strategies, Inc.
email:  ekimber@reallysi.com <mailto:ekimber@reallysi.com>
office: 610.631.6770 | cell: 512.554.9368
2570 Boulevard of the Generals | Suite 213 | Audubon, PA 19403
www.reallysi.com <http://www.reallysi.com>  | http://blog.reallysi.com
<http://blog.reallysi.com> | www.rsuitecms.com <http://www.rsuitecms.com>