[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: DOCBOOK-APPS: Bad Continuation of Multi-Byte UTF-8 Sequence
To Walsh's comment: > >Encoding can be specified by this way for external parsed entities, > >version pseudoattribute is optional - moreover some XML processors are > >unable to process external entity if it contains version information in > >its declaration. Pawson-san wrote: > Surely this is a weakness in the XML spec then? I'm stuffed if I need > an external parsed entity in a different encoding? While the encoding is part of the specification, it's optional to support multiple encodings. Saxon, for example, only supports UTF-8, USASCII, and ISO-8859-1 (all of which are exact subsets of UTF-8). You must not deal with languages that have multiple encodings. The reason I prefer to use Xalan/Xerces over Saxon is this every issue, the Apache XML/XSL tools allow the encoding to be specified on a per document basis. The loss is speed is made up for in versitility. What this function allows me to do is take a document produced by one engineer on a Windows box in Shift_JIS, then process it with an XSL(T) on my FreeBSD box that is encoded in EUC-JP. (For HTML, I often have the output encoding set in the XSL to be ISO-2022-JP.) I was recently told (but didn't confirm) that Danish has a number of different encodings as well depending on platform. Where i18n and l10n is concerned, this is a strength in the XML spec, not a weekness. -- Michael Westbay Work: Beacon-IT http://www.beacon-it.co.jp/ Home: http://www.seaple.icc.ne.jp/~westbay Commentary: http://www.japanesebaseball.com/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC