[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [docbook-apps] Incorrect characters in FO / PDF under windows
Interesting. The c3 82 is UTF-8 for A-caret, and c2 a0 is UTF-8 non-breaking space character. The stylesheet inserted a non-breaking space character, and it appears to have been serialized *twice*. By that I mean the original c2 (A-caret in isolatin1) was misinterpreted as an isolatin1 character and converted a second time to UTF-8, and the same with the a0. This doesn't happen when I use the basic Saxon java com.icl.saxon.StyleSheet, but it seems to with your transformation. Bob Stayton Sagehill Enterprises bobs@sagehill.net ----- Original Message ----- From: "Peer Brink" <brink@riege.com> To: <docbook-apps@lists.oasis-open.org> Cc: "Bob Stayton" <bobs@sagehill.net> Sent: Tuesday, January 06, 2004 5:55 AM Subject: Re: [docbook-apps] Incorrect characters in FO / PDF under windows > Bob Stayton wrote: > > The problem is not with FOP, because the extra > > characters are in the FO file before FOP sees them. > > I agree. > > > The separator between the "1." and the title should be > > one non-breaking space character, which when encoded > > in the default UTF-8 output encoding should be a two character > > sequence C2 A0 (hexadecimal). When you view the > > FO file with a typical text editor that assumes ISO-Latin-1 > > encoding, you will see A-caret and a non-breaking space, the characters > > associated with C2 and A0, respectively, in ISO-Latin-1. > > In the correct (Linux-created) Version I in fact find what you are describing: > c2 a0 > > But in the incorrect (windows-created) version there is > c3 82 c2 a0 > > How is this non-breaking space character inserted into the fo-file? Does the stylesheet has to care about the encoding? In other words: could it be a stylesheet-problem? Or is the transformer (saxon) responsible for the correct encoding? > > > Are you using any kind of customization? > > Yes. I'm using a stylesheet-customization and a DTD-customization. But the problem occurs also without these customizations when using original stylesheets and DTD. > > > Exactly what command > > are you using to process your files? > > I have written my own java-programm using javax.xml.transform.Transformer. And I'm setting encoding UTF-8 on the org.dom4j.io.OutputFormat before writing the transformed xml-document to disk. > > > Does the FO output file > > have an encoding="xxx" in the XML declaration at the top? > > In my output from Saxon 6.5.3 I see: > > > > <?xml version="1.0" encoding="utf-8"?> > > My output-fo has the same XML declaration. > > Thanks in advance for any help, > Peer. > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]