[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: DOCBOOK-APPS: Bad Continuation of Multi-Byte UTF-8 Sequence
Michael Westbay wrote: > While the encoding is part of the specification, it's optional to support > multiple encodings. Saxon, for example, only supports UTF-8, USASCII, and > ISO-8859-1 (all of which are exact subsets of UTF-8). ISO-8859-1 is not subset of UTF-8. If you have stream of bytes which represents some text in ISO-8859-1 encoding, it is not valid UTF-8 stream. Only us-ascii stream is also UTF-8 stream. > You must not deal with languages that have multiple encodings. The reason I > prefer to use Xalan/Xerces over Saxon is this every issue, the Apache XML/XSL > tools allow the encoding to be specified on a per document basis. The loss > is speed is made up for in versitility. You can still use Saxon and use -x a -y parameters to change parser used to process XML and XSL files. E.g., I am using Crimson parser which supports all encoding supported by my JVM - it is something about 150 different encodings. ----------------------------------------------------------------- Jirka Kosek e-mail: jirka@kosek.cz http://www.kosek.cz
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC