OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [docbook-apps] Easiest way to convert Word .doc or .rtf to DocBook?

I've also had good results using dbdoclet (Herold) on HTML from Word, but I also needed to do some cleanup both before and after the conversion.

If the original Word files are reasonably consistent, this will work pretty well once you figure out what cleanup is required, otherwise it will be a mess.

Given the volume, you may want to go with one of the conversion companies (Stilo (stilo.com) and Data Conversion Laboratory (dclab.com) are the two that come to mind immediately, but I'm sure there are others). They are set up to do this and will almost certainly save you time, and maybe money, in the end.

BTW, the book mentioned in my signature was written in Word and converted to DocBook using dbdoclet, so it does work, but the author was well disciplined in his use of Word, which if you have massive amounts of Word is probably not going to be the case in your situation:-).

Dick Hamilton
XML Press
New from XML Press:
The Secret Life of Word: A Professional Writer's Guide to Microsoft Word Automation

On Dec 2, 2011, at 8:53 AM, Bob Stayton wrote:

> I've had good results using dbdoclet.  I first let Word convert the content to HTML using Save As -> Webpage (filtered), and then apply dbdoclet to the HTML to generate docbook XML.  That approach lets Word handle all of Word's many coding options and quirks, filtering them down to something more standardized to convert.   dbdoclet is a Java toolkit, one part of which is for converting HTML to DocBook.
> Bob Stayton
> Sagehill Enterprises
> bobs@sagehill.net
> ----- Original Message -----
> From: Donna Saporito
> To: docbook-apps@lists.oasis-open.org
> Sent: Friday, December 02, 2011 7:07 AM
> Subject: [docbook-apps] Easiest way to convert Word .doc or .rtf to DocBook?
> Hi,
> I have to convert a massive amount of Word documents over to DocBook for my company. I also have a few FrameMaker documents that will need to be converted. I figure that I can save the .fm files as .rtf and then .doc files and follow the conversion process I will use for Word (once I figure out what to use).
> I am willing to purchase a tool such as WordPlay by Docsoft, Inc. or Upcast by InfinityLoop. I looked into MajiX, but I thought the installation instructions were rather confusing. (They may not be confusing to others, but the user doc said I’d have to run some command from Sun’s virtual machine, etc. ) I also looked into Yawc briefly, but it calls for attaching a new template in Microsoft Word. I am nervous to do this in case I corrupt any existing Word templates that I have.
> If there is a simple, straight-forward way to convert Word to XML? I am willing to ask my company to purchase the best tool.
> Any input will be appreciated. Thanks so much. - Donna
> Donna Saporito | Technical Writer | Technology and Engineering
> O: 201-217-3382 | M: 551-655-5721 | H: 973-845-6594| E: dsaporito@antennasoftware.com
> Antenna | Deploy Happiness | www.antennasoftware.com
> ………………………………………………………………….……….…
> Join other Antenna Fans @
> - http://www.facebook.com/antennainc 
> - http://www.twitter.com/AntennaSoftware
> This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. If you have received this email in error please delete it and notify the system administrator atadministrator@antennasoftware.com   ­­  

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]