[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: RE: DOCBOOK-APPS: From RTF to DocBook
Using Tidy, you can convert "dirty HTML" into "clean XHTML" and then do XSLT transformations on the XHTML. There's an example stylesheet on the DocBook Wiki on converting XHTML into DocBook which can be a starting point. As Petr said, there's always a lot of handwork involved when converting from visual markup into DocBook. Jeff -----Original Message----- From: Prikryl,Petr [mailto:PRIKRYLP@skil.cz] Sent: Friday, July 12, 2002 8:23 AM To: kangoo@tiscali.fr; docbook-apps@lists.oasis-open.org Subject: RE: DOCBOOK-APPS: From RTF to DocBook Sebastien wrote... > I am looking for a good tool able to convert from Doc/RTF > to DocBook if possible. I investigated and found Majix > which converts to Simplified DocBook but the conversion > does not seem to be quite good a support has done been > given to the product since 1999. Therefore no RTF 1.6 > support. > > I found UpCast which converts to an intermediary format > and after I have to go through XSLT transformations. The > conversion suits me, but I would like to know if there is > a good product (preferably Open Source) able to convert > from RTF to DocBook without big losses of information. > [...] I doubt that there is a really good tool that produces good DocBook sources from general Doc/RTF documents. The problem is that Doc/RTF is rather visual-markup oriented while DocBook is very structural-markup oriented. The conversion from visual to structural can always be only a guess (if there are not some very strict rules for the visual markup). For that reason I think that there always be a lot of hand work when cleaning up the source (pick a good editor with regular expressions). For the first transformation of Doc/RTF, I tried to export to HTML (directly from MS Word). The produced HTML is extremely ugly and really cripled. But there is the "tidy" utility (mentioned on W3C main page and at with home at http://tidy.sourceforge.net/). The tidy is able to convert the cripled HTML into the excelent one with CSS classes used instead of all the <FONT ...> etc. Then, using a good editor, I would clean then the HTML from visual into structural markup. Then I would convert the HTML into XML (DocBook). A wild guess: if I remember well, the xsltproc is able to read HTML -- so you could do some XSLT transformations of the cleaned HTML (I have no experience with this). Regards, Petr
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC