[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: DOCBOOK-APPS: Choosing a characterset for DocBook
On Fri, 15 Mar 2002, Christopher R. Maden wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > At 02:58 AM 3/15/02, Jens Stavnstrup wrote: > >On Fri, 15 Mar 2002, Christopher R. Maden wrote: > > > 1) Do all of your entities (i.e., files) have encoding declarations? What > > > are they? Remember that UTF-8 is the default unless you explicitly > > specify > > > a different encoding (or use a byte-order mark, in which case UTF-16 is > > the > > > default). > > > >The encoding chosed is as stated above ISO-8859-1, and yes that is > >specified in the XML desclaration statement. > > OK - then somehow SAXON isn't honoring that. > > > > 2) How are you invoking the parser? From within SAXON, obviously - is > > > SAXON being called from the command line, or within another program? What > > > exactly are the parameters it's being passed? > > > > >From Ant, no specific parameters specified (What are you BTW refering to > >?) > > > >I am still using Saxon 6.4.4, and checking the Change history in 6.5.1, I > >do not see any specific problem with using ISO-8859-1. > > SAXON definitely does not have a problem with ISO 8859-1. So somehow it's > being told to expect UTF-8. Exactly what are you using in Ant to call > SAXON? I haven't done a lot of work with Ant - is SAXON being instructed > to read the documents from the filesystem, or are they being passed as a > stream of some sort to SAXON? Yes, Saxon reads from the filesystem. The "exact" ant commad is <javac saxon.class ... <arg line="file.xml file.xsl saxon.extensions=1"/> <classpath="saxon.classpath"/> So as you see, nothing special. You are right, that saxon do not have any problem with ISO-8859-1. > > >My problem is not so much which encoding, I choose (If there any bugs > >(e.g. characters the parser can't accept), I can fix them). But rather > >trying to avoid my colleagues to ran into these issues. > > Once you can get SAXON to correctly read in ISO 8859-1 data, you shouldn't > have any problems; nearly every Windows and UNIX tool in a western European > environment can edit this encoding. The biggest problem you'll run into is > Windows users using the 128-159 range for things like curly quotes and > ellipses; these characters are control characters in ISO 8859-1, and while > not illegal, will not mean what the Windows user thinks they mean. This is exactly the issue. When Word users cut and paste from a word to an xml doc also edited in word. Sometimes word add extra characters in the 128-159 range - which are invisible in the word document, which Saxon considers UTF-8 and therefore comming to an arrupt halt. Jens > > ~Chris > - -- > Christopher R. Maden, Principal Consultant, crism consulting > DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training > <URL: http://crism.maden.org/consulting/ > > PGP Fingerprint: BBA6 4085 DED0 E176 D6D4 5DFC AC52 F825 AFEC 58DA > -----BEGIN PGP SIGNATURE----- > Version: PGP Personal Privacy 6.5.8 > > iQA/AwUBPJHVv6xS+CWv7FjaEQKwugCffMf14Ez0TdWE3EuyrGhaZnJGQHUAn3jn > mFt26glbd7bgFtn2+LqSkP7n > =qMy1 > -----END PGP SIGNATURE----- > -- ------------------------------------------------------------------------ Jens Stavnstrup Phone : Danish Defence Research Establishment Voice : + 45 - 39 15 17 97 Ryvangs Alle 1 - P.O. Box 2715 Fax : + 45 - 39 29 15 33 DK - 2100 Copenhagen O. E-Mail (Internet) : Denmark js@ddre.dk ------------------------------------------------------------------------
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC