Re: [docbook-apps] DB2PDF et al. alternative (long). Was: Show off what

Subject: Re: [docbook-apps] DB2PDF et al. alternative (long). Was: Show off what you've done with Docbook

Hi all, here is my experience (and a few strong opinions) about single sourcing from docbook to pdf/html/epub/docx toolchains.

In the past I setup, used and maintained various docbook/FO toolchain variations to localize for the italian market and to publish for print several books from the O'Reilly catalog starting from DB sources. While the docbook editing part worked like a charm using Oxygen, the pdf generation was *very* painful to setup and to maintain. At the end it worked, but it is not something I would recommend for anything but a very simple book/layout. We started using FOP as the pagination engine, but we found that it is not suitable for professional print production of complex books (tables, figures, boxes, sidebars, etc). At the end we used the AntennaHouse engine with its proprietary extensions. Note that today even O'Reilly has abandoned the FO ruote to produce pdf from DB.

My conclusion is that while DB is *very* good and well supported as a structuring and archiving format, FOP and friends are not a suitable solution for producing professional PDF. Moreover, FOP is on a dead end, as it is being replaced by css pagination (see AntennaHouse and Prince product lines).

I then started using a DB/latex/pdf toolchain, but I usually find this solution not flexible enough (having to edit some code just to move a figure is not something that scales up that well) and I think that the batch pagination paradigm used by tex/fop is not suitable for complex books.

I now routinely use with great satisfaction and efficiency a workflow based on transforming via xslt pipelines from docbook to idml (the xml format used by Adobe indesign) and then producing typographically perfect PDF interactively from the automatically generated indesign files.

I initially developed myself the xslt pipelines for trasforming to indesign, but a few months ago I discovered this nugget:

http://www.le-tex.de/en/transpect.html

It is a game-changer library/framework made by a brilliant German software house to *roundtrip* from/to any word/indesign/xml using as a pivot a format named hubxml, which is simplified docbook + css attributes. Everything has been open sourced.

As an example, these are the out of the box possibilities:
docx/idml to hubxml;
hubxml to html/idml/epub/docx;
interactive proofreading/copyfitting/image refining/pdf prodiction directly in indesign; export from indesign to idml and then conversion back to hubxml for archiving (i.e., true roundtripping)

For going from xml to indesign (idml) all you need is an indesign template with the layout and the typography (note that the indesign template could be created and maintained by a graphic designer who knows absolutely nothing about tags or xml/html/Idml) and a mapping xml configuration file. Tables, images and math formulae are supported almost out of the box.

The technology used is standard xslt/xproc/xsd/schematron. There is even a terrific module to check xml files (generated from word processing) against business rules expressed in schematron and then annotate an html version of the sources with warning messages (i.e. to check that only the styles from a controlled vocabulary of styles have been used). The runtime environment is java (saxon and calabash). The software is very robust and very well designed and written. See the above link for all the details.

I have now extensive experience with this library/framework and I use it already in production for a couple of clients. I am standardizing everything on this.

I'll soon have a hosted web environment in public beta. If someone is intetested, please drop me a private message.

Kind regards,
__peppo

docbook-apps message