OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: RE: [docbook-apps]PDF downconversion to docBook XML


It's great to hear there's another publisher looking at DocBook.

My company, XML Press, uses DocBook whenever possible.

Regarding conversion, I have not converted from PDF to DocBook, but I
have converted from HTML to DocBook with reasonable success. From HTML,
here is what I do:

1) Use tidy (http://tidy.sourceforge.net/) to generate xhtml from the
2) (optional) Use XSL to clean up the xhtml.
3) Use herold (http://www.michael-a-fuchs.de/) to generate XML.
4) (optional) Use XSL for post-processing.

Just using steps 1 and 3 works pretty well, but depending on what the
original html looks like and what you need the final DocBook to look
like, you may need some additional processing. Fortunately, once you
have xhtml, you can use xsl for processing before running herold, and
you can, of course, do some cleanup with xsl afterwards.

Herold works very well, so the pre and post-processing you need to do
will probably depend mostly on how good your conversion from PDF to HTML

Hope that helps.

Best Regards,
Dick Hamilton
XML Press
XML for Technical Communicators
(970) 231-3624 

> -----Original Message-----
> From: Kurt A Richardson [mailto:kurt@iscepublishing.com] 
> Sent: Friday, June 11, 2010 3:32 PM
> To: docbook-apps@lists.oasis-open.org
> Subject: Re: [docbook-apps]PDF downconversion to docBook XML
> Hi list
> I am new to DocBook, and XML-based publishing in general.   I run a 
> small publishing company (30 titles), that specializes in complexity 
> theory and I have been looking for ways to not only improve my little 
> doc flow methodology, but also make our content available to 
> our readers 
> in a variety of new modes and formats.  I have been drawn to 
> DocBook and 
> the possibility of using XSLT as a means to realize these 
> goals.  I have 
> little trouble figuring out how to prepare new content and am 
> hoping to 
> produce our next two titles purely from DocBook XML.  However, I also 
> have about 6000 pages of PDFs (not all having the same 
> format) that I'd 
> like to 'down convert' to DocBook XML.  I am making SLOW progress and 
> wondered if anyone here had any bright ideas about how to 
> approach this 
> task... e.g., is PDF to html the best first step?  Or does 
> anyone know 
> of any affordable services being provided to do the down 
> conversion for me.
> Many thanks in advance for any guidance you can provide.
> I'm really rather excited about the possibilities that arise 
> once I move 
> our publishing from Adobe CS to XML-based!
> Kind regards, Kurt
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: docbook-apps-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: 
> docbook-apps-help@lists.oasis-open.org

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]