docbook-apps message

Subject: Re: [docbook-apps] Dynamic web serving of large Docbook
From: Michael Smith <smith@xml-doc.org>
To: Frans Englich <frans.englich@telia.com>
Date: Wed, 13 Oct 2004 22:29:06 +0900
Frans,

Reading through your message a little more...

[...]

> The perfect solution, AFAICT, would be a dynamic, cached, generation. When a 
> certain section is requested, only that part is transformed, and cached for 
> future deliveries. It sounds nice, and sounds like it would be fast.
> 
> I looked at Cocoon(cocoon.apache.org) for helping me with this, and it does 
> many things well; it caches XSLT sheets, the source files, and even 
> CIncludes(same as XIncludes basically).
> 
> However, AFAICT, Docbook makes it not easy:
> 
> * If one section is to be transformed, the sheets must parse /all/ sources, in 
> order to resolve references and so forth. There's no way to workaround this, 
> right? 

It seems like your main requirement as far as HTML output is to be
able to preserve stable cross-references among your rendered
pages. And you would like to be able to dynamically regenerate
just a certain HTML page without regenerating every HTML page that
it needs to cross-reference.

And, if I understand you right, your requirement for PDF output is
to be able to generate a PDF file with the same content as each
HTML chunk, without regenerating the whole set/book it belongs to.
(At least that's what I take your mention "chunked PDF" in your
original message to mean.)

(But -- this is just an indicental question -- in the case of the
PDF chunks, you're not able to preserve cross-references between
individual PDF files, right? There's no easy way to do that. Not
that I know of at least.)

If the above is all an accurate description of your requirements,
then I think a partial solution is 

  - set up the relationship between your source files and HTML
    output such that the DocBook XML source for your parts are
    stored as separate physical files that corresponded one-to-one
    with the HTML files in your chunked output

  - use olinks for cross-references (instead of using xref or link)

      http://www.sagehill.net/docbookxsl/Olinking.html

If you were to do those two things, then maybe:

 1. You could do an initial "transform everything" step of your
    set/book file, with the individual XML files brought together
    using XInclude or entities; that would generate your TOC &
    index and one big PDF file for the whole set/book

 2. You would then need to to generate a target data file for each
    of your individual XML files, using a unique filename value for
    the targets.filename parameter for each one, and then
    regenerate the HTML page for each individual XML file, and
    also the corresponding PDF output file.

 3. After doing that initial setup once, then each time an
    individual part is requested (HTML page or individual PDF
    file), you could regenerate just that from its corresponding
    XML source file.

    The cross-references in your HTML output will then be
    preserved (as long as the relationship between files hasn't
    changed and you use the target.database.document and
    current.docid parameters when calling your XSLT engine).

I _think_ that all would work. But Bob Stayton would know best.
(He's the one who developed the olink implementation in the
DocBook XSL stylesheets.)

A limitation of it all is that, if a writer adds a new section to
a document, you're still going to need to re-generate the whole
set/book to get that new section to show up in the master TOC.
Same thing if a writer adds an index marker, in order to get that
marker to show up in the index.

But one way to deal with that is, you could just do step 3 above
on-demand, and have steps 1 and 2 re-run, via a cron job or
equivalent, at some regular interval -- once a day or once an hour
or at whatever the minimum interval is that you figure would be
appropriate given how often writers are likely to add new sections
or index markers.

And during that interval, of course there would be some
possibility of an end user not being aware of a certain newly
added section because the TOC hasn't been regenerated yet, and
similarly, not finding anything about that section in the index
because it hasn't been regenerated yet.

> * Cocoon specific: It cannot cache "a part" of a transformation, which means 
> the point above isn't workarounded. Right? This would otherwise mean the 
> transformation of all non-changed sources would be cached.

Caching is something that you could do with or without Cocoon, and
something that's entirely separate from transformation phase. You
wouldn't necessarily need Cocoon or anything Cocoon-like if you
used the solution above (and if it would actually work as I
think). And using Cocoon just to handle caching would probably be
overkill. I think there are probably some lighter-weight ways to
handle caching.

Anyway, I think the solution I described would be some work to set
up -- but you could hire some outside expertise to help you do
that (Bob Stayton comes to mind for some reason...).
PGP signature
Follow-Ups:
- Re: [docbook-apps] Dynamic web serving of large Docbook
  - From: Frans Englich <frans.englich@telia.com>