OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [docbook-apps] Serializing DB 5 with XOM: wrong encoding


Thanks for the tip. I downloaded the full xom source, modified 
XIncludeDriver.java, and ran ant java to generate a new xom-samples.jar. 
I copied it to my docbook toolchain, replacing the original 
xom-samples.jar. XOM serialization generates UTF-8 encoding now. Makes 
for much better DocBook processing from there.


P.S. The build wasn't quite that simple: it failed at first, complaining 
that it couldn't find jaxen-1.1.3-src. I downloaded that from 
http://jaxen.codehaus.org/releases.html. Thankfully that made the build 
happy.


On 03/22/2011 02:02 PM, Mauritz Jeanson wrote:
> |  -----Original Message-----
> |  From: Denis Bradford
> |
> |  I've been trying to preprocess xincludes in my DocBook 5
> |  build with xom,
> |  using the incantation in Bob Stayton's "Complete Guide":
> |
> |  $ java -cp "xom-1.2.1.jar:xom-samples.jar"
> |  nu.xom.samples.XIncludeDriver source.xml>  serialized.xml
> |
> |  The xincludes resolve just fine, but the serialized doc's
> |  encoding comes
> |  out as ISO-8859-1, so xom complains about UTF-8 characters in the
> |  source. The output doc ends, incomplete, with a cascase of xom
> |  Serializer errors.
> |
> |  According to the XOM api doc, it should be possible to specify the
> |  encoding as UTF-8, but I haven't found how to do it from the command
> |  line. Anybody know how (or if there's a better solution)?
> |  I'm assuming
> |  the failure is on account of the encoding problem, since the
> |  document
> |  seems to process normally otherwise.
>
>
> I just tried to process a couple of UTF-8 documents with XIncludeDriver
> (using XOM 1.2.6), and there were no errors. Unencodable characters were
> escaped as numeric character references in the output.
>
> The encoding is hardcoded in XIncludeDriver.java:
>
>   Serializer outputter = new Serializer(System.out, "ISO-8859-1");
>
> This may be a little unfortunate (and not too hard to fix), but
> XIncludeDriver is just a sample application after all.
>
> Mauritz
>
>
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]