dita message

Subject: Cross-Deliverable Links and Key Resolution
From: Eliot Kimber <ekimber@contrext.com>
To: dita <dita@lists.oasis-open.org>
Date: Thu, 04 Dec 2014 12:45:38 -0600
I think there may be some confusion about how I intended cross-deliverable
references to be processed. This was captured in the discussion that
Michael and I had about how to implement cross-deliverable link production
but since that concept didn't get included in the 1.3 spec I think most
people have not paid attention to it.

I think the relevant aspect of cross-deliverable linking for this
discussion is that the facility as specified explicitly does not require
that you know for sure that given peer key will actually be defined *at
the time you author the link*. The reason for this is that the
publications involved may be developed and produced asynchronously and
with little coordination. Thus the keys you want to link to may not in
fact have been literally defined at the time you author the links.

You can only know if a given key is or is not resolvable when you produce
the final deliverable of the document doing the linking: between the time
you author the link and when you produce the final form, anything could
happen to the target document.

In addition, if you use the generic key-based implementation approach that
Michael and I developed, all references to peer keys become local key
references when you produce the final deliverable so normal key resolution
rules apply during that final deliverable production process.

For all these reasons, peer key references simply have to be ignored for
the purpose of determining whether or not a key is resolvable as long as
you're not producing the final-form deliverable.

The reason that there is this distinction between production of the
final-form deliverable and any other processing you might be doing is
because resolving cross-deliverable links requires a multi-pass process
conceptually and that's how a lot of processors will implement it. In
particular, it is possible to have any amount of time elapse between when
you do pass 1, as described below, and when you do pass 2: there is no
requirement that they be performed together in time. Therefore I think we
can reasonably expect that most processors will actually reflect these two
pass in their implementations.

The passes are:

Pass 1: Each publication involved in cross-deliverable linking is
processed once to determine, *for that publication*, what deliverable
anchors any keys become for that deliverable. This mapping of
keys-to-deliverable-addresses is saved for use in subsequent passes (it
was the details of how this data could be saved that Michael and I
discussed and arrived at the proposed interchange solution of using
intermediate key definitions).

For example, if topic "topic-01.dita" is referenced by the topicref:

 <topicref keys="chapter-01" href="topic-01.dita">

in the map and if for HTML output the result is HTML file
"chapter-01.html", then the deliverable-specific key-to-anchor mapping
would be "key 'chapter-01' maps to HTML file 'chapter-01.html'" for this
deliverable. This mapping can be represented by a normal key definition of
the form:

<keydef keys="chapter-01"
  href="../../publication-02/chapter-01.html"
  scope="external"
  format="html"
/>

Pass 2: Each publication involved in cross-deliverable linking is
processed again, this time using the deliverable-specific key-to-anchor
mappings for each of the target publications to resolve any key references
to those publications.

Note that pass 1 does not *require* that any target peer maps be available
because you're only concerned with keys within each publication (that is,
generating that publication's key-to-anchor map).

It is not until pass 2 that the processor has to be able to resolve the
cross-deliverable keys and that is the point at which failures can and
should be reported.

Note also that there is an inherently loose coupling between these two
phases: in the general case you don't know when or if any given target
deliverable will itself be available and therefore you don't necessarily
know during pass 1 processing if a given key will or won't actually be
resolvable when you go to do pass 2. You might have authored links to a
key that you expect will be defined but doesn't happen to be defined in
the target publication at authoring time. As long as that key is defined
and resolvable when you do pass 2, it's all good.

Thus, there can be processing contexts in which it is not known, and
doesn't need to be known, that a peer key reference can't be resolved,
namely the pass-1 processing for each publication.

However, *if the peer maps are available*, processors certainly can check
the key definitions if they choose to and report the issue. But how you
manage your related publications relative to each other and the generation
of deliverables is entirely a business decision: you could impose very
tight controls or very loose controls, depending on what you need.

The DITA-defined aspects of the process accommodate both loose and tight
control.

For that reason, we cannot state the rule for peer keys as "if you can't
resolve the key it is treated as an unresolvable key" because there are
now valid processing contexts where you simply don't know if the key is or
is not resolvable.

I think the rule has to be stated in terms of producing final
deliverables: at that point, the normal unresolvable key rules should
apply. 

But, there's more:

The general mechanism Michael and I arrived at uses intermediate key
definitions as the way of capturing the key-to-anchor binding, as shown
above. 

The basic idea is that in pass 1 you generate a set of key definitions
that reflect the key-to-anchor binding for the deliverable you're
creating. These keys are declared as scope="external" and with a format
reflecting the target deliverable (e.g., format="html", format="pdf" or
whatever it is).

In pass 2, each publication that links to that deliverable literally
includes those keys before any locally-defined keys so that the
deliverable-specific keys take precedence.

In this scenario, during pass 2 processing the key definitions are now
local to the publication making the cross-deliverable link, not peer, and
so normal key processing rules apply: either the key is defined and it's
all good or it's not and normal undefined key rules apply.

Given this implementation approach, it should be clear that processors
should ignore peer key references, at least for the purposes of applying
unresolvable key rules, because they can't know for sure if the key is or
is not resolvable in the general case.

However, DITA users can choose to impose a rule that all peer maps must be
available during pass 1 processing and that they should reflect the final
set of keys that will be available in that publication. This is the
"tightly controlled interlinked publication set" use case, e.g., what
might be provided by a CCMS that manages the authoring and publication of
all the publications in a related set, enforcing specific business rules
for release and publication. (This was the use case I typically had in
mind when thinking about this problem, e.g., the "all-knowing publication
process manager".)
 
In that case processors can check the resolvability of peer key references
early and report them or treat them as unresolvable during pass 1 (or at
some appropriate workflow checkpoint where it is required that all links
be resolvable). But that is an implementation and business rule choice
that is not inherent in the cross-deliverable link mechanism and that
cannot be mandated by the standard.

Note that Michael had a completely different and equally-valid use case in
mind: the "disconnected and lightly coordinated interlinked document set",
where publications that link to each other are managed by different groups
with very little direct coordination other than the interchange of the
key-to-anchor maps necessary to produce publications that link to other
publications.

In the context of Michael's use case, it should be clear that trying to
enforce key resolvability during pass 1 is simply not generally useful or,
in some cases, not possible, because you simply don't have the required
key-to-anchor mapping during initial authoring or maybe not until you do
final deliverable generation for publication.

In this disconnected case you might expect owners of documents to also
interchange maps that provide just the key definitions to which other
publications are allowed to link. In that case, early validation of key
references would be possible. But again, this level of coordination is not
required by the facility as specified or intended.

Cheers,

E.
—————
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com