OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [docbook-apps] Word 2007+ to DocBook


Hi Greg,
as an alternative to the rountrip/.docx route, I have found a __very__
satisfactory MSword to docbook solution by using the fantastic macro,
written by Michal Kebrt , you can find at
http://wordtolatex.sourceforge.net.

The idea is to transform a plain old word doc file to a custom
intermediate "flat" xml and then transform this to docbook using a
fairly simple XSLT (around 250 lines, not optimized) .

The intermediate xml can have a flat structure like this (__much__
more simple and readable than the docx format):

<para><hdg1>chapter title</hdg1></para>
<para><style name="wordStyle1">a para</style></para>
<para><style name="wordStyle2">a para</style></para>
<para><style name="wordStyle3">a para</style></para>
<para><hdg2>section title</hdg1></para>
<para><style name="wordStyle3">a para</style></para>
<image fileref="img43.png" width="168" format=""/>
<para><style name="wordStyle3">a para</style></para>
<table>....</table>
...
...

Note that the only tag at the first level are (para|image|table)* to
keep the XSLT for trasforming to docbook fairly simple. You can now
use sibling relationships to transform to a docbook nested structure
based on the inner style tags.

Obviously you have to first cleanup the input word file
(apply/remove/rework styles as needed).

Word2Latex has a nice configuration file (and GUI) to map the various
standard word structures to custom xml.

I have not tested this method with complicated nested lists and the
like (and I do not claim that this method can convert __any__ word
document!), but it works surprisingly well with all the normal word
structures (footnotes, index entries, custom para stiles, etc.). We
use this method to routinely convert non technical books exported from
XPress/InDesign.

If you are interested, I can send to you my DTD to validate the
intermediate xml and an xslt for going from the intermediate xml to
docbook.

This method works only under Windows (you need Microsoft OLE
automation) and probably is not as general as the docbook roundtrip
xslt, but, in my view, has the advantage of flexibility and
simplicity: with a single configuration you can directly map custom
Word styles to docbook role attributes of paras (and you pay this
flexibility with the need of a pipeline of 2 xslt).

I have made some test with Openoffice "Save as docbook", but the
results were less than satisfactory, at least for me (the docbook OO
xslt seems quite old and apparently not actively mantained).

Regards,
__peppo

On Thu, May 6, 2010 at 10:29 PM,  <gpevaco@aol.com> wrote:
> Howdy DocBook Community:
>
> I am new to DocBook, and also new to this forum. I have been going through
> the archives, and found some very interesting discussions. Primarily I am
> interested in moving/converting some documents from Word which they were
> authored in to DocBook.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]