Expanding your Cross Reference Horizons

or How to Link Between Documents


Table of Contents

How to link between documents
Details to watch out for
Useful variations
Naming your data files
Using Makefiles
Using XInclude
Using catalogs
Modifying olink generated text
Adding the document title

When writing technical documentation, it is often necessary to cross reference to other information. When that other information is in the current document, then DocBook provides support with the <xref> and <link> elements. But if the information is in another document, you cannot use those elements because their linkend attribute value must point to an id attribute value that is in the current document.

The <olink> element is the equivalent for linking outside the current document. It has an attribute for specifying a document identifier (targetdoc) as well as the id of the target element (targetptr). The combination of those two attributes provides a unique identifier to locate cross references.

Note

The <olink> element has another set of attributes that support an older style of cross referencing using system entities. Those other olink attributes are targetdocent, linkmode, and localinfo. Those attributes are not used in the olink mechanism described here.

But how are external cross references resolved? Internal cross references are easy. When a document is parsed, it is loaded into memory and all of its linkends can be connected to ids within memory. But external documents are not loaded into memory, so there must be another mechanism for resolving olinks. The simplest mechanism would be to open each external document, find the target id, and resolve the cross reference. But such a mechanism would not scale well. It would require parsing a potentially large document to find one target, and then repeating that for as many olinks as you have. A more efficient mechanism would parse each document once and save the cross reference target information in a separate target database that can be loaded into memory for quick lookup.

The DocBook XSL stylesheets use such an external cross reference mechanism to resolve olinks. You first process all of your documents in a mode that collects the target information, and then you can process them in the normal mode to produce HTML or print output. The different processing mode is controlled using XSL stylesheet parameters.

How to link between documents

To use olinks to form cross references between documents, you have to spend a little time setting up your files so they can find each other's information. This section describes how to do that. Four of these six steps are performed only once, after which only two steps are required to process your documents as needed.

Procedure 1. Using olink

  1. Identify the documents

    Decide which documents are to be included in the domain for cross referencing, and assign a document id to each. A document id is a name string that is unique for each document in your collection. Your naming scheme can be as simple or elaborate as your needs require.

    For example, you might be writing mail agent documentation that includes a user's guide, an administrator's guide, and a reference document. These could be assigned simple document ids such as ug, ag, and ref, respectively. But if you expect to also cross reference to other user guides, you might need to be more specific, such as MailUserGuide, MailAdminGuide, and MailReference.

    You can add new documents to a collection at any time. You can also have more than one collection, each of which defines a domain of documents among which you can cross reference. A given document can be in more than one collection.

  2. Add olinks to your documents

    Insert an <olink> element where you want to form a cross reference to another document. You supply two attributes in each olink: targetptr is the id value of the element you are pointing to, and targetdoc is the document id that contains the element.

    For example, the Mail Administrator's Guide might have a chapter on user accounts like this:

    <chapter id="user_accounts">
    <title>Administering User Accounts</title>
    <para>blah blah</para>
    ...

    You can form a cross reference to that chapter in the Admin Guide by adding an olink in the User's Guide like this:

    You may need to update your
    <olink targetdoc="MailAdminGuide" targetptr="user_accounts">user accounts</olink>
    when you get a new machine.

    When the User's Guide is processed into HTML, the text user accounts will become a hot spot that links to the Admin Guide.

    If instead you create an empty olink element with the same attributes, then the hot text will be generated by the stylesheet from the title in the other document. In this example, the hot text would be Administering User Accounts. This has the advantage of being automatically updated when the title in the Admin Guide is updated.

  3. Generate target data files

    For each document in your collection, you generate a data file that contains all the potential cross reference targets. You do that by processing the document using your regular DocBook XSL stylesheet but with an additional parameter set. Here are examples showing the syntax for three different XSL processors for the same document.

    libxml2:
    xsltproc --param collect.xref.targets "'only'" docbook.xsl userguide.xml
    
    Saxon:
    java com.icl.saxon.StyleSheet userguide.xml docbook.xsl collect.xref.targets="only"
    
    Xalan:
    java org.apache.xalan.xslt.Process -IN userguide.xml -XSL docbook.xsl -PARAM collect.xref.targets "only"
    

    This command should generate in the current directory a target data file, named target.db by default. You can change the filename by setting the parameter targets.filename. The generated file is an XML file that contains only the information needed to form cross references to each element in the document.

    The DocBook XSL stylesheets contain the code needed to generate the target data file. The parameter collect.xref.targets controls how that code is applied, and has three possible values.

    no

    Don't generate the target data file (this is the default). Use this setting when you want to process just your document for output without first regenerating the target data file. This is the default since any documents without olinks don't need to do this extra processing step.

    yes

    Generate the target data file, and then process the document for output. Use this setting when you change your document and want to regenerate both the target data file and the output.

    only

    Generate the target data file, but don't process the document for output. Use this setting when you want to update the target data file for use by other documents, or when you set things up for the first time.

    In the command examples above, docbook.xsl should be the pathname to the DocBook stylesheet file you normally use to process your document for output. For example, that might be:

    /usr/lib/docbook/docbook-xsl-1.48/html/docbook.xsl

    If you use the DocBook chunking feature, then it would be the path to chunk.xsl instead. If you use a DocBook XSL customization file, then it should be pathname to that file. It will work if your customization file imports either docbook.xsl or chunk.xsl, and it will pick up whatever customizations you have for cross reference text. If you use different stylesheet variations for different documents, be sure to use the right one for each document. For example, you might use chunking on some long documents, but not on short documents. Use Makefiles or batch files to keep it all consistent.

  4. Decide on your HTML output hierarchy

    To form cross references between documents in HTML, their relative locations must be known. Generally, the HTML files for multiple documents are output to different directories, particularly if chunking is used. So before going any further, you must decide on the names and arrangement of the HTML output directories for all the documents in your collection.

    Here are the output directories for our example docs:

    documentation
        |
        |-- guides
        |      |-- mailuser      contains MailUserGuide files
        |      |-- mailadmin     contains MailAdminGuide files
        | 
        |-- reference
               |-- mailref       contains MailReference files

    It is only the relative location that counts; the top level name is not used. The stylesheet will compute the relative path for cross reference URLs using the relative locations.

  5. Create the target database document

    Each collection of documents has a master target database document that is used to resolve all olinks in that collection. The target database document is an XML file that is created once, by hand. It provides a framework that pulls in the target data for each of the documents in the collection. Since all the document data is pulled in dynamically, the database document itself is static, except for changes to the collection.

    Here is an example target database document, which is named targetdb.xml by default. In structures the documents in the collection into a sitemap element that provides the relative locations of the outputs for HTML. Then it pulls in the individual target data using system entity references to the files generated in step 3 above.

    <?xml version="1.0"?>
    <!DOCTYPE targetset SYSTEM "/tools/docbook-xsl-1.48/common/targetdatabase.dtd" [
    <!ENTITY ugtargets SYSTEM "/doc/userguide/target.db"> 1
    <!ENTITY agtargets SYSTEM "/doc/adminguide/target.db">
    <!ENTITY reftargets SYSTEM "/doc/man/target.db">
    ]>
    <targetset> 2
      <targetsetinfo> 3
        Description of this target database document,
        which is for the examples in olink doc.
      </targetsetinfo>
    
      <!-- Site map for generating relative paths between documents -->
      <sitemap> 4
        <dir name="documentation"> 5
          <dir name="guides"> 6
            <dir name="mailuser"> 7
              <document targetdoc="MailUserGuide" 8   baseuri="userguide.html"> 9
                &ugtargets; 10
              </document>
            </dir>
            <dir name="mailadmin">
              <document targetdoc="MailAdminGuide">
                &agtargets;
              </document>
            </dir>
          </dir>
          <dir name="reference">
            <dir name="mailref">
              <document targetdoc="MailReference">
                &reftargets;
              </document>
            </dir>
          </dir>
        </dir>
      </sitemap>
    </targetset>
         
    
    1

    Declare a system entity for each document target data file.

    2

    Root element for the database is targetset.

    3

    The targetsetinfo element is optional, and contains a description of the collection.

    4

    The sitemap element contains the framework for the hierarchy of HTML output directories.

    5

    Directory that contains all the HTML output directories.

    6

    Directory that contains only other directories, not documents.

    7

    Directory that contains one or more document output.

    8

    The document element has the document identifier in its targetdoc attribute.

    9

    For documents processed without chunking, the output filename must be provided in the baseuri attribute since that name is not generated by the document itself. Then cross references can be resolved using the form userguide.html#targetptr.

    10

    The system entity reference pulls in the target data for this document.

    When this document is processed, each target.db file is pulled into proper location in the hierarchy using its system entity reference, thus forming the complete cross reference database. That makes all the information available to the XSL stylesheets to lookup olink references and resolve them using the information in the database.

    The use of system entities permits the individual target.db data files for each document to be updated as needed, and the database automatically gets the update the next time it is processed.

    System entities also permit the use of XML or SGML catalogs to resolve the location of the various data files.

  6. Process each document for output

    Now all that remains is to process each document to generate its output. That's done using the normal XSL DocBook stylesheet with an additional parameter, the database filename. The DocBook XSL stylesheets (version 1.XX and higher) know how to resolve olinks using the target database.

    Here are command examples for three XSL processors:

    libxml2:
    xsltproc --param target.database.document "'/projects/mail/targetdb.xml'" \
             --output /http/guides/mailuser/userguide.html
             docbook.xsl userguide.xml
    
    Saxon:
    java com.icl.saxon.StyleSheet -o /http/guides/mailuser/userguide.html \
             userguide.xml docbook.xsl \
             target.database.document "/projects/mail/targetdb.xml" 
             
    Xalan:
    java org.apache.xalan.xslt.Process \
             -PARAM target.database.document "/projects/mail/targetdb.xml" \
             -OUT /http/guides/mailuser/userguide.html
             -IN userguide.xml -XSL docbook.xsl 
             

    The only difference from the normal document processing is the addition of the parameter target.database.document, which provides the location of the target database file. As your document is processed, when the stylesheet encounters an olink that has targetdoc and targetptr attributes, it looks up the values in the target database and resolves the reference. If it cannot open the database or find a particular olink reference, then it reports an error.

Details to watch out for

Olinks provide the tremendous power of cross referencing between documents, but they have a price. Olinks introduce dependencies between documents that are not an issue with standalone documents. The documents in a collection must "play together", and so they must follow a few rules.

  • If you change a document, you should always regenerate its target.db data file. Once a collection is set up, this step is most easily done by processing a modified document once with the parameter collect.xref.targets set to the value yes. That will make two passes through the document, the first to regenerate the target data file and the second to generate the normal output.

  • If you change a document, then you may need to regenerate other documents that make cross references to that data file. Such dependencies are most easily tracked using Makefiles so the update process can be automated.

  • The output locations specified in the sitemap element in the target database document must match where the HTML output actually lands. If they don't match, then the hot links you generate between documents won't reach the actual documents.

  • Whatever DocBook stylesheet (standard or customized) that you use to process a document for output should also be used to process the document for extracting the target data. Only then can you be sure that the style and content of the cross references will match the document. However, if you want the generated text to be different from the original for olinks, then you can use different stylesheets. See “Modifying olink generated text” for more on that option.

Useful variations

The olink system has some flexibility to adapt to your particular needs.

Naming your data files

You can tell the stylesheet what to name to generated data file for a document by using the targets.filename parameter. This is useful when you have more than one XML document in the same directory and you need to give them separate data files.

Using the Saxon example:

java com.icl.saxon.StyleSheet userguide.xml docbook.xsl \
      collect.xref.targets="only" \
      targets.filename="mytargetfile"

Be sure to specify the same filename for that document when you create your master targetdb.xml target database document. It's used in the system entity declaration for that document.

Likewise, you can choose your own name for the master database file. It is used when you want to resolve olinks, so you can pass the name as a parameter:

java com.icl.saxon.StyleSheet userguide.xml docbook.xsl \
      target.database.document="mytargetdatabase.xml"

Using Makefiles

Olinks create dependencies between documents, and Makefiles are good at tracking dependencies. Here is a simple Makefile for one of the example documents.

SINGLESTYLE = /tools/xsl/docbook-xsl-1.49/docbook.xsl
CHUNKSTYLE = /tools/xsl/docbook-xsl-1.49/chunk.xsl

UGOUTPUT = /http/guides/mailuser
AGOUTPUT = /http/guides/mailadmin
REFOUTPUT = /http/reference/mailref

AdminTargets = ../adminguide/target.db
RefTargets = ../man/target.db

html:  $(UGOUTPUT)/userguide.html

$(UGOUTPUT)/userguide.html : userguide.xml target.db $(AdminTargets) $(RefTargets)
        java com.icl.saxon.StyleSheet -o $(UGOUTPUT)/userguide.html \
             userguide.xml $(SINGLESTYLE) \
             target.database.document "/projects/mail/targetdb.xml" 

target.db : userguide.xml
        java com.icl.saxon.StyleSheet userguide.xml $(SINGLESTYLE) \
             collect.xref.targets="only" \
             targets.filename="target.tmp"
        if diff target.db target.tmp > /dev/null 2>&1 ; \
             then cp target.tmp target.db; fi
        rm target.tmp
         

In the target.db rule, the target data is saved to a temporary file first. If it differs from the existing target.db file, then it overwrites it. If it does not differ, then the file is not overwritten and its timestamp is left unchanged. Any document changes that don't affect cross reference targets would thus not update the data file. That would prevent unnecessary processing of other documents that have a dependency on its cross reference data.

Using XInclude

You can use XInclude instead of system entities in the targetdb.xml database file. That has the advantage of not requiring system entity declarations before they are used. An XInclude can just specify a path directly to a data file. You also would not need the DOCTYPE document type declaration and the dtd. But you would need an XSL processor that handles XIncludes.

Here is a portion of the example database using XInclude:

<?xml version="1.0"?>
<targetset>
  <targetsetinfo>
    Description of this target database document,
    which is for the examples in olink doc.
  </targetsetinfo>

  <!-- Site map for generating relative paths between documents -->
  <sitemap>
    <dir name="documentation">
      <dir name="guides"> 
        <dir name="mailuser">
          <document targetdoc="MailUserGuide" baseuri="userguide.html">
            <xi:include href="/doc/userguide/target.db"  
                        xmlns:xi="http://www.w3.org/2001/XInclude"/>
          </document>
        </dir>
        ...
     

The path to the data file is in the href attribute. You must also declare the XInclude namespace in each include element.

Using catalogs

Catalog files let you map logical names to actual pathnames on a filesystem. That provides greater flexibility, reduces maintenance, and makes your system more portable, since you can just edit catalog files to make changes in locations. These features become more important when you have cross references between documents because the cross reference data files have to be locatable.

There are two kinds of catalog files now, SGML catalogs and XML catalogs. Here are examples of both types.

SGML catalogs

If you use system entities in your target database, then you can simplify that file and let the catalog resolve the actual paths to the target data files.

/projects/mail/targetdb.xml:
<?xml version="1.0"?>
<!DOCTYPE targetset SYSTEM "targetdatabase.dtd" [
<!ENTITY ugtargets SYSTEM "ugtargets">
<!ENTITY agtargets SYSTEM "agtargets">
<!ENTITY reftargets SYSTEM "reftargets">
]>

/tools/catalog:
SYSTEM "targetdatabase.dtd" "/tools/docbook-xsl-1.48/common/targetdatabase.dtd" 
SYSTEM "ugtargets" "/doc/userguide/target.db"
SYSTEM "agtargets" "/doc/adminguide/target.db"
SYSTEM "reftargets" "/doc/man/target.db"

Your processing commands will have to change to make use of the catalog. Here is how it would be used with xsltproc, in which you set an environment variable for the catalog path:

export SGML_CATALOG_FILES=/tools/catalog
xsltproc --param target.database.document "'/projects/mail/targetdb.xml'" \
         --output /http/guides/mailuser/userguide.html
         docbook.xsl userguide.xml

When the processor reads the targetdb.xml file, it will use the catalog to resolve the locations of the system entities for the target data files.

XML catalogs

An XML catalog that provides the same mapping as the above SGML catalog would look like this:

/tools/catalog.xml:
<?xml version="1.0"?>
<!DOCTYPE catalog
   PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN"
   "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"
>
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <system systemId="targetdatabase.dtd" "/tools/docbook-xsl-1.48/common/targetdatabase.dtd" />
  <system systemId="ugtargets" uri="/doc/userguide/target.db" />
  <system systemId="agtargets" uri="/doc/adminguide/target.db" />
  <system systemId="reftargets" uri="/doc/man/target.db" />
</catalog>

To use an XML catalog with xsltproc, you just use a different environment variable:

export XML_CATALOG_FILES=/tools/catalog.xml
xsltproc --param target.database.document "'/projects/mail/targetdb.xml'" \
         --output /http/guides/mailuser/userguide.html
         docbook.xsl userguide.xml

Other processors use different syntax for catalog processing. You can learn a lot about them in this web article XML Entity and URI Resolvers.

Modifying olink generated text

Olinks that don't contain any text must have the content generated by the stylesheet. By default, the text that is generated is the same text you would get if you had used an xref element to an internal target in the other document. For example, an xref to a chapter might generate text like this:

Chapter 3: "Using a Mouse"

For an olink to that same chapter from another document, you might not want the Chapter 3 text to be included, because readers might think it was referring to the third chapter of the document they were reading, when instead it is referring to the third chapter in another document.

The target data file for each document contains both the generated text and the pieces of text for each target. That is, it contains the full string as above, as well as separate fields for the title, subtitle, element type, and number. That permits you to assemble the generated text in a different way using a different stylesheet.

There is a set of text templates in each language file in the common directory for resolving olinks, using the text pieces from the data file. You can turn on that feature by setting the parameter use.local.olink.style to a value of 1 instead of the default zero.

Adding the document title

You may want to make it obvious that olink cross references are going to another document rather than the current document. This is more important for print output than HTML output, because with print output the reader must locate the other document. If you set the parameter olink.doctitle to nonzero, then the stylesheet will append the other document's title to the reference. The title is taken from the title child element of the root element in the external document.