dita message

Subject: Worked Processing Example for Michael's Key-Based Approach to Peer Reference Management
From: Eliot Kimber <ekimber@rsicms.com>
To: dita <dita@lists.oasis-open.org>
Date: Sat, 20 Oct 2012 10:06:52 -0500
This is my worked example of Michael's key-based approach to peer reference
management, as mentioned in the separate "New proposal" thread I started.
This post is provided for information.

Imagine a system of two peer publications, Pub A and Pub B.

Pub A has an xref to topic T1.dita:

  <p>See <xref keyref="pubB-T1"/> for more information.</p>

In Pub A as authored, the keydef used by the xref is:

<keydef keys="pubB-T1"
  href="../../common/topics/T1.dita"
  scope="peer"
/>

[The intent of the author, as reflected in the key name, is to link to topic
T1 *as used by* publication B1. But there is nothing in either the original
xref or the keydef that can tell you that you mean T1 as used by B1 because
DITA 1.2 provides no defined way to indicate which use of a peer topic you
mean. See separate discussion around proposal 13041. For the purposes of
this exercise, we will assume that the association is made in some
processor-specific way. This could, for example, be by manual modification
of generated keydef files.]

Assume that the source location of topic T1 is not specific to any single
publication, meaning that you cannot infer from its source location alone
what publication it is probably published in.

You process Pub A and the processor informs you that it doesn't know what
result target to use for the peer topic with the key pubB-T1, because while
it knows the *source* location of the topic, you haven't told it where the
*published* location of that topic will be.

You realize that you must first publish Pub B in order to know where it will
be published and thus where topic-T1.dita will be in whatever outputs you
publish it to.

As a side effect of this processing, the system creates a data set
indicating that topic "T1.dita" is referenced from Pub A using the keyname
"pubB-T1". This data set could take any form, but an obvious form for it
would be as key definitions, as that's an inherently interchangeable format
that doesn't necessarily depend on any special processing. Let's call this
the "as-referenced" keydef set for Pub A.

You process Pub B, specifying the as-referenced keydef set for Pub A as a
parameter to the Pub B publication process.

The processor sees that topic "T1.dita" is referenced by Pub A and so
generates a new set of key definitions reflecting the location of T1.dita as
published, in this case as HTML file T1.html in some location specific
publication PubB. Lets call this the "as-published" keydef set for Pub B,
reflecting the published location of each topic in Pub B that is referenced
by Pub A. (The processor would generate one as-published keydef set for each
different peer publication for which an as-referenced keydef set was
provided as input to the processing of Pub B.)

You would also specify as part of this process the root published locations
for all of the different possible published forms of Pub A, so that the
processor can construct working relative or absolute URLs from Pub B, as
published, to Pub A, as published. This information could, of course, be in
the as-referenced keydef set for Pub A.

The new key def for key "pubB-T1" would look something like:

<keydef keys="pubB-T1"
  href="../../PubB/common/topics/T1.html"
  format="html"
  scope="peer"
>
 <topicmeta>
  <navtitle>Topic T1 Title</navtitle>
  <metadata>
   <data name="pubtitle">Pub B Title</data>
  </metadata>
 </topicmeta>
</keydef>

Note that I've included in the keydef the title of the target topic and the
title of the publication that contains it. This allows the Pub A processor
to generate an appropriately-labeled "cross document" link, e.g.:

  See "<a href="../../PubB/topics/topic-B1.html"
target="PubB">Pub B Title</a>" in <i>Pub B Title</i>.

Note also that the location of the HTML rendering of topic T1 reflects its
use within publication Pub B. This means that a simple rewriting of the
original URI in the original keydef in Pub A as authored would not
necessarily work because there may not be a simple one-to-one relationship
between the source location of topic T1 and the published location of topic
T1 as published in the context of a specific publication, as shown in this
example. The same would be true if Pub B specified @copy-to on the use of
T1. 

At this point you know where topic-B1 *as published* is relative to Pub A
*as published*. You have a working keydef for key "pubB-B1" that you can now
include in publication Pub A in place of the original keydef to topic T1 as
authored. This replacement could be literal by modifying the map for Pub A
or it could be virtual through some intermediate processing that knows to go
fetch keydefs provided by Pub B and swap them in in place of the originals.
One way to do this would be to simply process those keydefs before any
others in the original map, which by keydef precedence rules would have the
effect of making those keydefs effective.

Note that this is inherently a two-pass process: every publication on which
Pub A is dependent must have first been processed in order to construct the
keydefs for the used topics *as published*. Of course, having done that
once, you can cache the results for a given publication until that
publication changes.

To recap:

1. You process Pub A to determine what peer topics it uses from other
publications.

2. You process the other publications to produce the as-published keydefs
needed by Pub A, using the as-referenced keydefs from Pub A.

3. You process Pub A, using the as-published keydefs as provided by the
other publications, producing the final publishable result.

Note that this is all done using keydefs--no magic data files. The only
potential magic is the swapping in of the as-published keydefs in the second
pass for Pub A, but that swapping in could be done manually, so it's not
magic, only a convenience (although an obvious one).

The scenario as presented so far has only considered one output format,
HTML. In that case, the mapping from references as authored to references as
published is direct: the HTML rendering of each referenced target.

However, a given source publication may be published to any number of forms
and formats, e.g., HTML and PDF.

In that case, it's ambiguous in the general case what the target of any
particular reference *as authored* should be: from the HTML do you want the
HTML or PDF version of a given peer topic?

One easy answer is "like links to like". That's a reasonable business rule
and a good simplifying assumption, but it's not the only possible answer and
thus it's not the right answer for all use cases.

Thus, a general solution has to provide for indicating, on a per-reference
basis, which published instance of a given target topic to point to.

One way to think about this is that when you publish a root map to a
specific target you generate the used-by keydefs for that specific topic,
indicating which root map and specific processing options were used (which
must include any filtering specifications and details about the rendition
target itself, such as the URI of the published location, the data format,
and so on--all of that information could be captured using normal DITA
metadata elements with appropriate name values or specialized element
types). I generalize the information about a specific rendition as the
"publication specification" for a given rendition instance: a root map plus
a publication specification uniquely identifies a specific rendered instance
of the root map. You can't reliably address published versions of maps if
you don't have some way to uniquely identify each published instance.

When you go to publish Pub A you now have a choice of keydefs to use for
creating output-specific links in the published result. In a manual process
you would simply edit all the as-published keydef sets, picking and choosing
among the keydefs to create a new keydef set that reflects your desire as
the publisher of Pub A for a specific published instance of Pub A.

For example, you might say, for the HTML, I want links to the HTML versions
of the peer topics, except for two specific links, which need to go to the
PDF because they are links to complex tables that just don't work in HTML.

With your constructed as-published keydefs in hand you can now publish Pub A
and get exactly the result you want.

Again, no magic was required, only manual manipulation of sets of keydefs.

Obviously, that manipulation could be made easier through easy-to-imagine
editing tools that let you quickly select the keydefs to use for a given
output, where simpler business rules (like links to like) are not
sufficient.

You could also define a convention for keydef metadata to indicate, for that
keydef, which format you want to always link to, which is appropriate when
you know the intended rendition target as a map author (which you might, for
example, with a business rule that says for certain types of information
legal requirements are that you go to the PDF and not the HTML version).

You could also use output-specific conditional processing to have
format-specific variants of the same key definition (see the
d4p_renditionTarget attribute domain in the DITA for Publishers vocabulary,
which I created for specifically this purpose).

So again we can address (almost) all the requirements using existing DITA
1.2 mechanisms and clever use of keydefs.

Cheers,

Eliot

-- 
Eliot Kimber
Senior Solutions Architect, RSI Content Solutions
"Bringing Strategy, Content, and Technology Together"
Main: 512.554.9368
www.rsicms.com
www.rsuitecms.com
Book: DITA For Practitioners, from XML Press,
http://xmlpress.net/publications/dita/practitioners-1/