OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

xmlvoc-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Subject: Re: [xmlvoc-comment] character encoding? (PSI question)

* Patrick Durusau
| Question: PSI - character encoding?
| I suspect what was meant is covered by the Unicode standard as:
| Character Encoding Scheme: A character encoding form //plus byte
| serialization. There are seven character encoding schemes in Unicode:
| UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE and UTF-32LE.

This was what I had in mind with "character encoding", yes. Many
people use a mental ontology that divides things into character
encodings and character sets, but Unicode Technical Report #17[1] goes
further and divides it into five levels.

I'm not sure how detailed we want to be. It seems to me that our
character encoding corresponds to their CES and our character set to
their CCS. I don't see much of a need for their ACR, while their CEF
would allow us to distinguish between UTF-16 and UTF-16LE/UTF-16BE,
but wouldn't do much else. 

To me it seems that what we can take away from UTR #17 is:

 a) better definitions of character encoding and character set,

 b) the notion of a Transfer Encoding Syntax (such as base64 and
    UUencoding), and

 c) the notion of a Character Encoding Form.

I like a), think I like b) and am open to c).

[1] <URL: http://www.unicode.org/reports/tr17/ >

Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]

Powered by eList eXpress LLC