OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

docbook-apps message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: Re: DOCBOOK-APPS: Choosing a characterset for DocBook


Christopher R. Maden wrote at 15 Mar 2002 02:06:47 -0800:
 > The parser obviously is not aware that you have chosen ISO 8859-1.  That is 
 > the expected error message if an 8859-1 document contains any high bytes 
 > (128+) and the parser is trying to parse it as UTF-8.
 > 
 > 1) Do all of your entities (i.e., files) have encoding declarations?  What 
 > are they?  Remember that UTF-8 is the default unless you explicitly specify 
 > a different encoding (or use a byte-order mark, in which case UTF-16 is the 
 > default).

Strictly speaking, it's "or use UTF-16 with a byte-order mark", since
you can have a byte-order mark with UTF-8.

UTF-16 without a byte-order mark (BOM) can be mistaken for a number of
other encodings, hence you need the BOM if you're omitting the
encoding declaration.  Both UTF-16 without the BOM and the 'number of
other encodings' all need to have the encoding declaration so the XML
processor can determine the encoding.  UTF-16 with both the BOM and an
encoding declaration is okay, too.

8-bit text without an encoding declaration is expected to be UTF-8.
Hence, if the text isn't UTF-8, you need the encoding declaration.
UTF-8 text with the BOM (EF BB BF) and without an encoding declaration
should be recognised as UTF-8.  However, using the BOM with UTF-8
wasn't mentioned in the Unicode Standard, Version 2.0 (which was
current when XML 1.0 was published), so some early XML processors
weren't designed to recognise the UTF-8 BOM.  The UTF-8 BOM was not
mentioned in Appendex F of XML 1.0, but is mentioned in Appendix F of
XML 1.0 Second Edition (and was mentioned in the version of ISO/IEC
10646 current when XML 1.0 was published).

Regards,


Tony Graham
------------------------------------------------------------------------
XML Technology Center - Dublin                mailto:tony.graham@sun.com
Sun Microsystems Ireland Ltd                       Phone: +353 1 8199708
Hamilton House, East Point Business Park, Dublin 3            x(70)19708


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC