OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: WordML to Docbook problem


Hello,

I am using the WordML scripts in the latest snapshot from SourceForge,
dated today, and trying to round-trip documents between WordML and
DocBook.  However, I'm having a few problems.

The main one (at the moment) is converting existing Word documents to
DocBook.  When I step through the four-part process, everything looks
good until the output of the wordml-final.xsl script.  The DocBook file
lacks a root element; the <w:wordDocument> element that contained
everything else seems to disappear during the last conversion.

When I created a trivial case (if there is such a thing where WordML is
concerned) with two paragraphs and nothing else, I got the following
file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument
    xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml";
    xmlns:v="urn:schemas-microsoft-com:vml"
    xmlns:w10="urn:schemas-microsoft-com:office:word"
    xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core";
    xmlns:aml="http://schemas.microsoft.com/aml/2001/core";
    xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint";
    xmlns:o="urn:schemas-microsoft-com:office:office"
    xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
    w:macrosPresent="no" w:embeddedObjPresent="no" w:ocxPresent="no"
    xml:space="preserve"><o:DocumentProperties><o:Title>Hello
World</o:Title><o:Author>XO Communications</o:Author><o:LastAuthor>XO
Communications</o:LastAuthor><o:Revision>2</o:Revision><o:TotalTime>28</o:TotalTime><o:Created>2006-04-13T18:40:00Z</o:Created><o:LastSaved>2006-04-13T19:08:00Z</o:LastSaved><o:Pages>1</o:Pages><o:Words>2</o:Words><o:Characters>12</o:Characters><o:Company>XO Communications</o:Company><o:Lines>1</o:Lines><o:Paragraphs>1</o:Paragraphs><o:CharactersWithSpaces>13</o:CharactersWithSpaces><o:Version>11.6502</o:Version>
</o:DocumentProperties><w:fonts><w:defaultFonts
            w:ascii="Times New Roman" w:fareast="Times New Roman"
            w:h-ansi="Times New Roman" w:cs="Times New Roman" />
</w:fonts><w:styles><w:versionOfBuiltInStylenames
w:val="4" /><w:latentStyles
            w:defLockedState="off" w:latentStyleCount="156" /><w:style
            w:type="paragraph" w:default="on" w:styleId="Normal"><w:name
                w:val="Normal" /><w:rPr><wx:font
                    wx:val="Times New Roman" /><w:sz
w:val="24" /><w:sz-cs
                    w:val="24" /><w:lang w:val="EN-US" w:fareast="EN-US"
                    w:bidi="AR-SA" />
</w:rPr>
</w:style><w:style w:type="character" w:default="on"
            w:styleId="DefaultParagraphFont"><w:name
                w:val="Default Paragraph Font" /><w:semiHidden />
</w:style><w:style w:type="table" w:default="on"
            w:styleId="TableNormal"><w:name w:val="Normal
Table" /><wx:uiName
                wx:val="Table Normal" /><w:semiHidden /><w:rPr><wx:font
                    wx:val="Times New Roman" />
</w:rPr><w:tblPr><w:tblInd w:w="0" w:type="dxa" /><w:tblCellMar><w:top
                        w:w="0" w:type="dxa" /><w:left w:w="108"
                        w:type="dxa" /><w:bottom w:w="0"
w:type="dxa" /><w:right
                        w:w="108" w:type="dxa" />
</w:tblCellMar>
</w:tblPr>
</w:style><w:style w:type="list" w:default="on"
w:styleId="NoList"><w:name
                w:val="No List" /><w:semiHidden />
</w:style>
</w:styles><w:docPr><w:view w:val="print" /><w:zoom
w:percent="100" /><w:doNotEmbedSystemFonts /><w:proofState
            w:spelling="clean" w:grammar="clean" /><w:attachedTemplate
            w:val="" /><w:defaultTabStop
w:val="720" /><w:punctuationKerning /><w:characterSpacingControl

w:val="DontCompress" /><w:optimizeForBrowser /><w:validateAgainstSchema /><w:saveInvalidXML
            w:val="off" /><w:ignoreMixedContent
w:val="off" /><w:alwaysShowPlaceholderText

w:val="off" /><w:compat><w:breakWrappedTables /><w:snapToGridInCell /><w:wrapTextWithPunct /><w:useAsianBreakRules /><w:dontGrowAutofit />
</w:compat>
</w:docPr><w:body><wx:sect><w:p><w:r><w:t>Hello!</w:t>
</w:r>
</w:p><w:p><w:r><w:t>World!</w:t>
</w:r>
</w:p><w:sectPr><w:pgSz w:w="12240" w:h="15840" /><w:pgMar w:top="1440"
                    w:right="1800" w:bottom="1440" w:left="1800"
                    w:header="720" w:footer="720" w:gutter="0" /><w:cols
                    w:space="720" /><w:docGrid w:line-pitch="360" />
</w:sectPr>
</wx:sect>
</w:body>
</w:wordDocument>

This should print "Hello!" and "World!", each on its own line.

I then ran this file (hw.xml) through the conversion process to DocBook:
xsltproc -o
normalized.xml /opt/docbook-xsl-snapshot/wordml/wordml-normalise.xsl
hw.xml

xsltproc -o
sections.xml /opt/docbook-xsl-snapshot/wordml/wordml-sections.xsl
normalized.xml 

xsltproc -o
blocks.xml /opt/docbook-xsl-snapshot/wordml/wordml-blocks.xsl
sections.xml 

xsltproc -o hw-db.xml /opt/docbook-xsl-snapshot/wordml/wordml-final.xsl
blocks.xml 

I ended up with:

<?xml version="1.0"?>
<!DOCTYPE para PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd";>
<para>Hello!</para><para>World!</para>

This file is clearly not well-formed.

I am running on Slackware Linux 10.1, and have tested this against
current releases of both xsltproc and Saxon with the same results.

I have put all of the interim files along with clean versions of these
at http://www.miburo.net/wordmlExample/ .  Note that I have renamed the
output to hw-db.xml.txt to prevent browsers from complaining about the
XML structure.

Does anyone see what I'm doing wrong?

	Thanks in advance,
	Chuck Harris
	chuckjharris@yahoo.com



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]