OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: DOCBOOK-APPS: Bad Continuation of Multi-Byte UTF-8 Sequence


Michael Westbay wrote:

> While the encoding is part of the specification, it's optional to support
> multiple encodings.  Saxon, for example, only supports UTF-8, USASCII, and
> ISO-8859-1 (all of which are exact subsets of UTF-8).

ISO-8859-1 is not subset of UTF-8. If you have stream of bytes which
represents some text in ISO-8859-1 encoding, it is not valid UTF-8
stream. Only us-ascii stream is also UTF-8 stream.

> You must not deal with languages that have multiple encodings.  The reason I
> prefer to use Xalan/Xerces over Saxon is this every issue, the Apache XML/XSL
> tools allow the encoding to be specified on a per document basis.  The loss
> is speed is made up for in versitility.

You can still use Saxon and use -x a -y parameters to change parser used
to process XML and XSL files. E.g., I am using Crimson parser which
supports all encoding supported by my JVM - it is something about 150
different encodings.

-----------------------------------------------------------------
  Jirka Kosek  	                     
  e-mail: jirka@kosek.cz
  http://www.kosek.cz


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC