[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: [xmlvoc-comment] character encoding? (PSI question)
* Patrick Durusau | | Question: PSI - character encoding? | | I suspect what was meant is covered by the Unicode standard as: | | Character Encoding Scheme: A character encoding form //plus byte | serialization. There are seven character encoding schemes in Unicode: | UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE and UTF-32LE. This was what I had in mind with "character encoding", yes. Many people use a mental ontology that divides things into character encodings and character sets, but Unicode Technical Report #17[1] goes further and divides it into five levels. I'm not sure how detailed we want to be. It seems to me that our character encoding corresponds to their CES and our character set to their CCS. I don't see much of a need for their ACR, while their CEF would allow us to distinguish between UTF-16 and UTF-16LE/UTF-16BE, but wouldn't do much else. To me it seems that what we can take away from UTR #17 is: a) better definitions of character encoding and character set, b) the notion of a Transfer Encoding Syntax (such as base64 and UUencoding), and c) the notion of a Character Encoding Form. I like a), think I like b) and am open to c). [1] <URL: http://www.unicode.org/reports/tr17/ > -- Lars Marius Garshol, Ontopian <URL: http://www.ontopia.net > GSM: +47 98 21 55 50 <URL: http://www.garshol.priv.no >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC