[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: RE: DOCBOOK-APPS: How to translate HTML to DocBook
Dave Brooks wrote... >At 12:53 19/03/2002 +1100, Andrew Westcombe wrote: >>At 05:00 PM 12/03/2002 -0600, Patrick Hartling wrote: >> >>> It also helps if the source is "good" HTML. Having closing tags such >>> as </li>, </p>, and </br> helps immensely. >> >>I've used DocParse myself, it's not bad, and very good value. As for >>having "good" HTML, Dreamweaver has a very nice command for stripping out >>junk, esp. from former MSWord files. > >HTML Tidy (see http://www.w3.org/People/Raggett/tidy/) is very good for >cleaning up HTML. What I like on the HTML Tidy is that it can replace the <FONT...> and the like things by more standard elements with CSS classes. It can also produce the XML output (i.e. the differences between the original HTML and the wanted DocBook XML will be even smaller). Then, using a good text editor of your choice ;-), it is much much easier to get the result. I tried to XMLize the HTML from MS Word 97 earlier, before I knew HTML Tidy. It was painful even with Perl in hands. The (free) HTML Tidy can really save a lot of work. HTH, Petr -- Petr Prikryl, Skil, spol. s r.o., (prikrylp@skil.cz)
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC