[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: DOCBOOK-APPS: Choosing a characterset for DocBook
Christopher R. Maden wrote at 15 Mar 2002 02:06:47 -0800: > The parser obviously is not aware that you have chosen ISO 8859-1. That is > the expected error message if an 8859-1 document contains any high bytes > (128+) and the parser is trying to parse it as UTF-8. > > 1) Do all of your entities (i.e., files) have encoding declarations? What > are they? Remember that UTF-8 is the default unless you explicitly specify > a different encoding (or use a byte-order mark, in which case UTF-16 is the > default). Strictly speaking, it's "or use UTF-16 with a byte-order mark", since you can have a byte-order mark with UTF-8. UTF-16 without a byte-order mark (BOM) can be mistaken for a number of other encodings, hence you need the BOM if you're omitting the encoding declaration. Both UTF-16 without the BOM and the 'number of other encodings' all need to have the encoding declaration so the XML processor can determine the encoding. UTF-16 with both the BOM and an encoding declaration is okay, too. 8-bit text without an encoding declaration is expected to be UTF-8. Hence, if the text isn't UTF-8, you need the encoding declaration. UTF-8 text with the BOM (EF BB BF) and without an encoding declaration should be recognised as UTF-8. However, using the BOM with UTF-8 wasn't mentioned in the Unicode Standard, Version 2.0 (which was current when XML 1.0 was published), so some early XML processors weren't designed to recognise the UTF-8 BOM. The UTF-8 BOM was not mentioned in Appendex F of XML 1.0, but is mentioned in Appendix F of XML 1.0 Second Edition (and was mentioned in the version of ISO/IEC 10646 current when XML 1.0 was published). Regards, Tony Graham ------------------------------------------------------------------------ XML Technology Center - Dublin mailto:tony.graham@sun.com Sun Microsystems Ireland Ltd Phone: +353 1 8199708 Hamilton House, East Point Business Park, Dublin 3 x(70)19708
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC