OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [docbook-apps] Easiest way to convert Word .doc or .rtf to DocBook?

What do you mean by "massive amount"? Are the files huge or do you have a 1,000 files?
Some people use the Word to HTML to DocBook trail, some a more direct conversion, like Majix. Either way, how much cleanup you do after the conversion is important.
I use Majix and I have tweeked it so that it produces fairly clean DocBook (i.e usually no editing afterwards). I could make an install package available to you if you like. I am a maintainer of the Majix package and I have not released new packages lately and there has been many Docbook related changes in the past year. I got on the maintainer list because it appeared to be an inactive project and I had many Docbook related changes that needed to get into the source tree.
I would suggest you try several packages out there to see what they produce for a single file. This may be more important than the actual package, because if you have a "massive amount" that could equate to a lot of "fix up" if they are not perfect and I assume you want to limit your post conversion handling to zero ;-)
One other caution, Word formats range 20+ years and not all Word versions produce the same RTF format. I have had problems converting Word97 files and then I saved them as Word2003 and they were fine - and visa versa. Weird. I would be interested to see what FrameMaker RTF looks like and how it goes through Majix.
Dean Nelson
In a message dated 12/2/2011 8:53:26 A.M. Pacific Standard Time, bobs@sagehill.net writes:
I've had good results using dbdoclet.  I first let Word convert the content to HTML using Save As -> Webpage (filtered), and then apply dbdoclet to the HTML to generate docbook XML.  That approach lets Word handle all of Word's many coding options and quirks, filtering them down to something more standardized to convert.   dbdoclet is a Java toolkit, one part of which is for converting HTML to DocBook.
Bob Stayton
Sagehill Enterprises
----- Original Message -----
Sent: Friday, December 02, 2011 7:07 AM
Subject: [docbook-apps] Easiest way to convert Word .doc or .rtf to DocBook?


I have to convert a massive amount of Word documents over to DocBook for my company. I also have a few FrameMaker documents that will need to be converted. I figure that I can save the .fm files as .rtf and then .doc files and follow the conversion process I will use for Word (once I figure out what to use).


I am willing to purchase a tool such as WordPlay by Docsoft, Inc. or Upcast by InfinityLoop. I looked into MajiX, but I thought the installation instructions were rather confusing. (They may not be confusing to others, but the user doc said I’d have to run some command from Sun’s virtual machine, etc. ) I also looked into Yawc briefly, but it calls for attaching a new template in Microsoft Word. I am nervous to do this in case I corrupt any existing Word templates that I have.


If there is a simple, straight-forward way to convert Word to XML? I am willing to ask my company to purchase the best tool.


Any input will be appreciated. Thanks so much. - Donna


Donna Saporito | Technical Writer | Technology and Engineering
O: 201-217-3382 | M: 551-655-5721 | H: 973-845-6594| E: dsaporito@antennasoftware.com

Antenna | Deploy Happiness | www.antennasoftware.com

Join other Antenna Fans @

- http://www.facebook.com/antennainc
- http://www.twitter.com/AntennaSoftware


This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. If you have received this email in error please delete it and notify the system administrator at administrator@antennasoftware.com   ­­  

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]