dita message

Subject: RE: [dita] Unique topic ids in the cross publication or global CMS use case
From: "Jim Tivy" <jimt@bluestream.com>
To: "'Eliot Kimber'" <ekimber@rsicms.com>, "'dita'" <dita@lists.oasis-open.org>
Date: Sun, 16 Jun 2013 10:13:38 -0700
Thanks Eliot

That is pretty clear

From your power point the text below is crystal clear.  In the proposal
13041 you have to work harder to get it - you define location as authored
and location as delivered - but the power point is more clear.
I think the DITA 1.3 needs to express something about "no requirement of
topicids to have global uniqueness " because, as you say there is a
misconception on this and we are not a strictly normative spec.

******************** your text from powerpoint ***************

Addressing within the content as authored:
Defined by the source format, e.g., DITA XML
For XML source, should be independent of any given output format
DITA defines the rules for addressing within DITA XML


Addressing from the publication as delivered:
Defined by the delivery format: PDF, HTML, EPUB, etc.
No single standard
Details may be proprietary

************************************************************

> -----Original Message-----
> From: Eliot Kimber [mailto:ekimber@rsicms.com]
> Sent: June-16-13 6:10 AM
> To: Jim Tivy; dita
> Subject: Re: [dita] Unique topic ids in the cross publication or global
CMS use
> case
> 
> Jim,
> 
> I think your concern is addressed by the current cross-deliverable
addressing
> proposal: it does in fact propose the use of keys and mappings from keys
to
> locations as delivered as the way to ensure reliable cross-deliverable
addressing.
> The proposal as documented should make it clear that processors are
obligated
> to manage a mapping from objects as authored to objects as delivered such
that
> any delivery constraints are not imposed back onto the authored content,
for
> example, making topic IDs unique within a publication.
> 
> If the proposal is not sufficiently clear on that point then we must
correct it.
> Because I am so deeply into issues of linking and addressing I often
forget that
> what to me seems obvious is in fact not at all obvious.
> 
> Perhaps it's useful to discuss the general issue of topics IDs and their
non-
> requirement for uniqueness in the context of addressing generally. I think
there
> is either some general misunderstanding in the community on what is and
isn't
> required and probably some poor implementation choices made long ago that
> still linger in our community. I don't fault implementors for not always
> understanding the subtleties of addressing--it's a challenging subject.
> 
> -----------------------------------
> Topic ID Uniqueness Is Not Required
> 
> Topic *IDs* are not required to be unique outside the context of their
containing
> XML document, nor do they need to be.
> 
> However, topic document addresses *are* necessarily unique, because the
XML
> documents that contain topics are distinct storage objects, which means
they
> have a unique location within the storage system that contains them and
that
> storage system has a unique location within the set of all possible
storage
> locations. That's how storage systems work.
> 
> In the world of the Web, every storage system exists on some kind of
server
> with a unique IP address. The storage system itself then exists at some
unique
> location within that server, and the resources managed by the storage
system
> then have unique locations, e.g., filenames, object IDs, or what have you.
> 
> Thus, every *topic* has a unique URL/ID pair that distinguishes it from
*all
> possible other topics* in existence at any moment in time.
> 
> Thus the ID of the <topic> element is necessary *only* to distinguish
different
> topics within the same *XML document*. But that requirement is imposed by
> XML itself since DITA defines topic IDs as XML IDs.
> 
> If an XML document consists of exactly one topic, then addressing the
document
> is sufficient to reliably address the topic (by the rules of DITA
> addressing) and in that case the topic ID is only of interest for
addressing
> elements within the topic, because DITA fragment identifiers are
> {topicid}/{elementid} pairs. But even there, the value "topicid" for all
topic IDs in
> this case is as good as anything.
> 
> For the purposes of addressing in deliverables, there is no need for topic
IDs to
> be unique because the processor that generates the deliverable can ensure
that
> the IDs used in the deliverable are unique within that deliverable. The
deliverable
> is itself a storage object (or collection of storage objects) that, like
all storage
> objects, have identity within the set of all possible storage objects.
> 
> In addition, the processor that produces the deliverable must be able to
have the
> information required to maintain the mapping from objects as authored
(that is,
> topic ID, element IDs, and keys) to their locations as delivered. This is
true
> because the processor must have both the original source and deliverable
it
> generated available to it--this does not mean that all existing processors
were
> implemented in such a way that this information is maintained, only that
they all
> *could have been*.
> 
> So again, addressability is assured as long as the processor generating
the
> output generates unique IDs for any addressable things put into the
deliverable
> and maintains the source-to-deliverable address mapping.
> 
> If you need to do cross-deliverable addressing then you need to have a
mapping
> from the locations (not just IDs) of the things as authored to the
locations of the
> things as delivered. That mapping could be managed in many ways but the
> current cross-deliverable proposal does it through the use of keys and
> intermediate key definition sets that map the keys as used in the content
as
> authored to the locations of the key-bound resources in the deliverable.
That is
> sufficient to support the requirement for addressability.
> 
> In addition, the @copy-to attribute on <topicref> gives authors additional
> control over deliverable addresses by allowing the assignment of new
virtual
> source storage object locations ("filenames") for distinct references to
the same
> topic or map. That doesn't remove the requirement for
source-to-deliverable
> address mapping, but it means that authors may influence the details of
the
> result.
> 
> The DITA 1.2 spec doesn't say anything about topic ID uniqueness because
it
> doesn't need to. Topic IDs don't need to be unique, except as already
required
> by XML rules.
> 
> It can be a *convenience* to assign unique IDs to the topics under your
control,
> but there is no way that any agency short of the divine can ensure global
ID
> uniqueness unless we mandate the use of a specific UUID generator.
> 
> By the same token, there's nothing wrong with making your topic IDs
globally
> unique if you want to, it's just not necessary and could be a waste of
effort. Or it
> could be a useful simplifying strategy. A typical use case might be to
make topic
> IDs be object IDs of topics managed in a component content management
> system. That's fine as long as everyone is clear that these IDs can at
best be
> unique within the scope of that one component content management system
> instance (even if you're using some sort of UUID generator there's always
the
> chance, however remote, that somebody might randomly choose the same ID
> for one of their topics).
> 
> Cheers,
> 
> Eliot
> 
> On 6/15/13 8:08 PM, "Jim Tivy" <jimt@bluestream.com> wrote:
> 
> > Hi Folks
> >
> > I have found numerous discussions that topic id is not required to be
> > unique within a publication or collection of topics  none of these
> > discussions in the current 1.2 specification (that I could find
> > anyhow)  although omission means no requirement.
> > One such reference was:
> > http://tech.groups.yahoo.com/group/dita-users/message/14260
> > Of course topic id does have to be unique within an XML document 
> > that is not what I am talking about here  rather I am addressing
> > intra publication uniqueness or even global uniqueness.
> > Some PDF processors, such as the PDF5 processor for Antenna House,
> > however, require that topic ids do have to be unique within a
publication.
> > At first it seems like this requirement is overstepping what Oasis has
> > recommended (or not recommended through omission).  However, one
> > reason for this unique id requirement of PDF5 is to support the cross
> > publication linking use case.
> > It just so happens that we dealt with this use case recently in
> > approving proposal 13041 (Facility for key-based, cross-deliverable
> > referencing (Kimber)).
> > It seems if we do not recommend or say anything about unique topic
> > ids, then we leave processors to ³twist in the wind²  or make extra
> > requirements like
> > PDF5 did.  On the other hand, if we require unique topic ids, we might
> > be pre-supposing certain implementations which in fact are not
necessary.
> > It seems, however, if we are to add proposals such as 13041, then we
> > might want to talk about how cross publication linking might happen 
> > this proposal
> > 13041 opens the door to some new possibilities.
> >
> > For example, if our references were key rooted, we can used key export
> > tables and the processors could do something like the following:
> >
> > I use a PDF example here but it may have bearing on other cross
> > publication links such as cross chunked HTML.
> > In PDF, for example, to allow processor defined unique ids to topics
> > for the purposes of merge (Like PDF2 merge) then to link from PDFB to
> > PDFA would require PDFA to export its external links to PDFB because
> > the ids of the topics in the PDF are not known at author time.
> >
> > PDFA (export as XML)
> >
> > keyname    newMergetopicId                      Original fragmentId
> > MyKey1      a223345                                       be3333333
> >
> > Then PDFB consumes this and has a reference to MyKey1/be3333333
> >
> > Then when a processor builds PDFB and when it references PDFA with
> > MyKey1/be3333333 it would resolve to PDFA.a223345/ be3333333
> >
> > In this case, a223345 could be entirely generated by the PDF processor
> > when PDFA is built, however, be3333333 would remain stable but not
> > unique as a fragment Id.
> >
> > My question here is, should we say something in the spec or when we
> > document proposal 13041 regarding this.  Should we have text that says
> > ³we DO NOT recommend processors rely on unique topic ids within a
> > publication² or ³we DO recommend  same².
> >
> > cheers
> > Jim
> 
> --
> Eliot Kimber
> Senior Solutions Architect, RSI Content Solutions "Bringing Strategy,
Content,
> and Technology Together"
> Main: 512.554.9368
> www.rsicms.com
> www.rsuitecms.com
> Book: DITA For Practitioners, from XML Press,
> http://xmlpress.net/publications/dita/practitioners-1/
Follow-Ups:
- Re: [dita] Unique topic ids in the cross publication or global CMS use case
  - From: Eliot Kimber <ekimber@rsicms.com>
References:
- Unique topic ids in the cross publication or global CMS use case
  - From: "Jim Tivy" <jimt@bluestream.com>
- Re: [dita] Unique topic ids in the cross publication or global CMS use case
  - From: Eliot Kimber <ekimber@rsicms.com>