OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: DOCBOOK-APPS: Choosing a characterset for DocBook



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At 02:58 AM 3/15/02, Jens Stavnstrup wrote:
>On Fri, 15 Mar 2002, Christopher R. Maden wrote:
> > 1) Do all of your entities (i.e., files) have encoding declarations?  What
> > are they?  Remember that UTF-8 is the default unless you explicitly 
> specify
> > a different encoding (or use a byte-order mark, in which case UTF-16 is 
> the
> > default).
>
>The encoding chosed is as stated above ISO-8859-1, and yes that is
>specified in the XML desclaration statement.

OK - then somehow SAXON isn't honoring that.

> > 2) How are you invoking the parser?  From within SAXON, obviously - is
> > SAXON being called from the command line, or within another program?  What
> > exactly are the parameters it's being passed?
>
> >From Ant, no specific parameters specified (What are you BTW refering to
>?)
>
>I am still using Saxon 6.4.4, and checking the Change history in 6.5.1, I
>do not see any specific problem with using ISO-8859-1.

SAXON definitely does not have a problem with ISO 8859-1.  So somehow it's 
being told to expect UTF-8.  Exactly what are you using in Ant to call 
SAXON?  I haven't done a lot of work with Ant - is SAXON being instructed 
to read the documents from the filesystem, or are they being passed as a 
stream of some sort to SAXON?

>My problem is not so much which encoding, I choose (If there  any bugs
>(e.g. characters the parser can't accept), I can fix them). But rather
>trying to avoid my colleagues to ran into these issues.

Once you can get SAXON to correctly read in ISO 8859-1 data, you shouldn't 
have any problems; nearly every Windows and UNIX tool in a western European 
environment can edit this encoding.  The biggest problem you'll run into is 
Windows users using the 128-159 range for things like curly quotes and 
ellipses; these characters are control characters in ISO 8859-1, and while 
not illegal, will not mean what the Windows user thinks they mean.

~Chris
- -- 
Christopher R. Maden, Principal Consultant, crism consulting
DTDs/schemas - conversion - ebooks - publishing - Web - B2B - training
<URL: http://crism.maden.org/consulting/ >
PGP Fingerprint: BBA6 4085 DED0 E176 D6D4  5DFC AC52 F825 AFEC 58DA
-----BEGIN PGP SIGNATURE-----
Version: PGP Personal Privacy 6.5.8

iQA/AwUBPJHVv6xS+CWv7FjaEQKwugCffMf14Ez0TdWE3EuyrGhaZnJGQHUAn3jn
mFt26glbd7bgFtn2+LqSkP7n
=qMy1
-----END PGP SIGNATURE-----



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC