[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [oic] interoperability and non-latin scripts
Mingfei Jia(¼ÖÃ÷·É), That is a great list. For Character encoding issues, I have some questions. 1. Do GB18030 and GB2312 character-set encodings all have corresponding character encodings in Unicode [4.0? 5.0?]? 2. I am asking because XML is specified in terms of Unicode no matter what the encoding parameter is. I understand one might want to say encoding="GB2312" to ensure that text is confined to the characters and encodings of that specfication to be useful in entry, display, printing and processing outside of the ODF package. Having a reliable "standard" mapping to Unicode is valuable, if available. (It also matters what version of XML 1.0 we specify as normative for ODF, in terms of what can appear in special types, such as xml:id, NCNAMEs, etc.) 3. How do you see this impacting use of IRIs and "full-path" names of Zip items? Can the "full-path" be carried in UTF-8 even though the coded characters are meant to be limited to those of GB2312 or GB18030? Likewise, would you expect that manifest.xml could have encoding="GB2312" (for example)? 4. Are GB2312 and GB18030 what are known as double-byte encodings? Is it possible to detect when an XML file is in such an encoding in order to correctly process the XML prologue (so the encoding parameter can be detected and read)? (Put differently, is ISO 646 [a.k.a. 7-bit ASCII] a subset of the GB encodings so the XML prologue is readable correctly so long as non-646 characters do not appear?) 5. Finally, are there useful English-language descriptions or translations of the GB2312 and GB18030 standards that you can refer us to on-line? I am demonstrating my ignorance of these matters. Your further guidance will be valuable and very welcome. - Dennis Dennis E. Hamilton ------------------ NuovoDoc: Design for Document System Interoperability mailto:Dennis.Hamilton@acm.org | gsm:+1-206.779.9430 http://NuovoDoc.com http://ODMA.info/dev/ http://nfoWorks.org -----Original Message----- From: Ming Fei Jia [mailto:jiamingf@cn.ibm.com] http://lists.oasis-open.org/archives/oic/200902/msg00025.html Sent: Sunday, February 15, 2009 00:30 To: Hanssens Bart Cc: oic@lists.oasis-open.org Subject: Re: [oic] interoperability and non-latin scripts [ ... ] As to special interoperability issues from non-latin scripts, what I can see now is listed here, which current ODF major products only support some of them, for members' information: (1)Character encoding issues. For example, although many Chinese documents use Unicode, but still many of Chinese documents are encoded by China national standard GB18030 or the older national standard GB2312. This needs ODF applications to support these special encodings so that can show Chinese documents correctly. [ ... ]
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]