OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [docbook-apps] Converting MS Word documents to DocBook 5 XML


Hi Jeff,
any conversion from word to xml (docbook or other schema) depends heavily on how (well) your word files are structured by means of styles, so you will probably need to massage and fix the files in Word by hand before the actual conversion can be done effectively.

Unfortunately the styles functionalities offered by Word are quite a mess and (IMHO) not robust enough and therefore more often than not you end up with quite "dirty" files. I have found that the time spent on Word can often be better used adding tag manually in an xml editor. For this is use Oxygen in author mode as it has a *very* useful feature, namely a quite intelligent paste from word where many inlines and sectioning get translated automatically to docbook. In this way you can just do a quick clean in Word (the search&replace based on styles is your friend here) to keep just the inlines, the section titles, lists and tables and then copy&paste from word to Oxygen in a blank docbook file (Oxigen has templates for both DB4 and 5). You will end up with all the paras, sections and main structures already tagged and from this point on you can work directly in a structured editing environment to finalize the markup. I don't remember if the footnotes gets converted correctly, but you can do a quick test on this.

Your mileage may vary depending on the complexity of your source files, but this "manual" approach often is the quickest and more accurate, as strange it may seems. 

__peppo



On Thu, May 24, 2012 at 7:38 AM, Jeff Powanda <jpowanda@vocera.com> wrote:

What’s the easiest way to convert MS Word 2007 documents to DocBook 5 XML?

 

I’ve tried using the DocBook roundtrip stylesheets. They seemed to work OK if I did the following:

1.       Copied the DocBook styles in template.dot to the document.

2.       Applied the DocBook styles to the document.

3.       Saved the document as a Word 2003 XML file.

4.       Converted the Word 2003 XML file to DocBook 5 XML.

 

This worked OK, but it was a lot of work to apply the DocBook styles to the document (and there are several documents to convert). Also, the resulting DocBook XML file has dbk namespace prefixes on all the elements. How do I remove them?

 

I’m not interested in the roundtrip aspect of the roundtrip stylesheets. I just want to get Word content into DocBook 5.

 

Regards,

Jeff Powanda

Vocera Communications, Inc.

 




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]