dita message

Subject: Re: [dita] Unique topic ids in the cross publication or global CMS use case
From: Eliot Kimber <ekimber@rsicms.com>
To: Jim Tivy <jimt@bluestream.com>, dita <dita@lists.oasis-open.org>
Date: Sun, 16 Jun 2013 08:09:59 -0500
Jim,

I think your concern is addressed by the current cross-deliverable
addressing proposal: it does in fact propose the use of keys and mappings
from keys to locations as delivered as the way to ensure reliable
cross-deliverable addressing. The proposal as documented should make it
clear that processors are obligated to manage a mapping from objects as
authored to objects as delivered such that any delivery constraints are not
imposed back onto the authored content, for example, making topic IDs unique
within a publication.

If the proposal is not sufficiently clear on that point then we must correct
it. Because I am so deeply into issues of linking and addressing I often
forget that what to me seems obvious is in fact not at all obvious.

Perhaps it's useful to discuss the general issue of topics IDs and their
non-requirement for uniqueness in the context of addressing generally. I
think there is either some general misunderstanding in the community on what
is and isn't required and probably some poor implementation choices made
long ago that still linger in our community. I don't fault implementors for
not always understanding the subtleties of addressing--it's a challenging
subject. 

-----------------------------------
Topic ID Uniqueness Is Not Required

Topic *IDs* are not required to be unique outside the context of their
containing XML document, nor do they need to be.

However, topic document addresses *are* necessarily unique, because the XML
documents that contain topics are distinct storage objects, which means they
have a unique location within the storage system that contains them and that
storage system has a unique location within the set of all possible storage
locations. That's how storage systems work.

In the world of the Web, every storage system exists on some kind of server
with a unique IP address. The storage system itself then exists at some
unique location within that server, and the resources managed by the storage
system then have unique locations, e.g., filenames, object IDs, or what have
you. 

Thus, every *topic* has a unique URL/ID pair that distinguishes it from *all
possible other topics* in existence at any moment in time.

Thus the ID of the <topic> element is necessary *only* to distinguish
different topics within the same *XML document*. But that requirement is
imposed by XML itself since DITA defines topic IDs as XML IDs.

If an XML document consists of exactly one topic, then addressing the
document is sufficient to reliably address the topic (by the rules of DITA
addressing) and in that case the topic ID is only of interest for addressing
elements within the topic, because DITA fragment identifiers are
{topicid}/{elementid} pairs. But even there, the value "topicid" for all
topic IDs in this case is as good as anything.

For the purposes of addressing in deliverables, there is no need for topic
IDs to be unique because the processor that generates the deliverable can
ensure that the IDs used in the deliverable are unique within that
deliverable. The deliverable is itself a storage object (or collection of
storage objects) that, like all storage objects, have identity within the
set of all possible storage objects.

In addition, the processor that produces the deliverable must be able to
have the information required to maintain the mapping from objects as
authored (that is, topic ID, element IDs, and keys) to their locations as
delivered. This is true because the processor must have both the original
source and deliverable it generated available to it--this does not mean that
all existing processors were implemented in such a way that this information
is maintained, only that they all *could have been*.

So again, addressability is assured as long as the processor generating the
output generates unique IDs for any addressable things put into the
deliverable and maintains the source-to-deliverable address mapping.

If you need to do cross-deliverable addressing then you need to have a
mapping from the locations (not just IDs) of the things as authored to the
locations of the things as delivered. That mapping could be managed in many
ways but the current cross-deliverable proposal does it through the use of
keys and intermediate key definition sets that map the keys as used in the
content as authored to the locations of the key-bound resources in the
deliverable. That is sufficient to support the requirement for
addressability.

In addition, the @copy-to attribute on <topicref> gives authors additional
control over deliverable addresses by allowing the assignment of new virtual
source storage object locations ("filenames") for distinct references to the
same topic or map. That doesn't remove the requirement for
source-to-deliverable address mapping, but it means that authors may
influence the details of the result.

The DITA 1.2 spec doesn't say anything about topic ID uniqueness because it
doesn't need to. Topic IDs don't need to be unique, except as already
required by XML rules.

It can be a *convenience* to assign unique IDs to the topics under your
control, but there is no way that any agency short of the divine can ensure
global ID uniqueness unless we mandate the use of a specific UUID generator.

By the same token, there's nothing wrong with making your topic IDs globally
unique if you want to, it's just not necessary and could be a waste of
effort. Or it could be a useful simplifying strategy. A typical use case
might be to make topic IDs be object IDs of topics managed in a component
content management system. That's fine as long as everyone is clear that
these IDs can at best be unique within the scope of that one component
content management system instance (even if you're using some sort of UUID
generator there's always the chance, however remote, that somebody might
randomly choose the same ID for one of their topics).

Cheers,

Eliot

On 6/15/13 8:08 PM, "Jim Tivy" <jimt@bluestream.com> wrote:

> Hi Folks
>  
> I have found numerous discussions that topic id is not required to be unique
> within a publication or collection of topics  none of these discussions in
> the current 1.2 specification (that I could find anyhow)  although omission
> means no requirement.
> One such reference was:
> http://tech.groups.yahoo.com/group/dita-users/message/14260
> Of course topic id does have to be unique within an XML document  that is not
> what I am talking about here  rather I am addressing intra publication
> uniqueness or even global uniqueness.
> Some PDF processors, such as the PDF5 processor for Antenna House, however,
> require that topic ids do have to be unique within a publication.
> At first it seems like this requirement is overstepping what Oasis has
> recommended (or not recommended through omission).  However, one reason for
> this unique id requirement of PDF5 is to support the cross publication linking
> use case.
> It just so happens that we dealt with this use case recently in approving
> proposal 13041 (Facility for key-based, cross-deliverable referencing
> (Kimber)).
> It seems if we do not recommend or say anything about unique topic ids, then
> we leave processors to ³twist in the wind²  or make extra requirements like
> PDF5 did.  On the other hand, if we require unique topic ids, we might be
> pre-supposing certain implementations which in fact are not necessary.
> It seems, however, if we are to add proposals such as 13041, then we might
> want to talk about how cross publication linking might happen  this proposal
> 13041 opens the door to some new possibilities.
>  
> For example, if our references were key rooted, we can used key export tables
> and the processors could do something like the following:
>  
> I use a PDF example here but it may have bearing on other cross publication
> links such as cross chunked HTML.
> In PDF, for example, to allow processor defined unique ids to topics for the
> purposes of merge (Like PDF2 merge) then to link from PDFB to PDFA would
> require PDFA to export its external links to PDFB because the ids of the
> topics in the PDF are not known at author time.
>  
> PDFA (export as XML)
>  
> keyname    newMergetopicId                      Original fragmentId
> MyKey1      a223345                                       be3333333
>  
> Then PDFB consumes this and has a reference to MyKey1/be3333333
>  
> Then when a processor builds PDFB and when it references PDFA with
> MyKey1/be3333333 it would resolve to PDFA.a223345/ be3333333
>  
> In this case, a223345 could be entirely generated by the PDF processor when
> PDFA is built, however, be3333333 would remain stable but not unique as a
> fragment Id.
>  
> My question here is, should we say something in the spec or when we document
> proposal 13041 regarding this.  Should we have text that says ³we DO NOT
> recommend processors rely on unique topic ids within a publication² or ³we DO
> recommend  same².
>  
> cheers
> Jim

-- 
Eliot Kimber
Senior Solutions Architect, RSI Content Solutions
"Bringing Strategy, Content, and Technology Together"
Main: 512.554.9368
www.rsicms.com
www.rsuitecms.com
Book: DITA For Practitioners, from XML Press,
http://xmlpress.net/publications/dita/practitioners-1/
Follow-Ups:
- RE: [dita] Unique topic ids in the cross publication or global CMS use case
  - From: "Jim Tivy" <jimt@bluestream.com>
References:
- Unique topic ids in the cross publication or global CMS use case
  - From: "Jim Tivy" <jimt@bluestream.com>