OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

oic message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [oic] interoperability and non-latin scripts


Mingfei Jia(¼ÖÃ÷·É),

That is a great list.  

For Character encoding issues, I have some questions.

1. Do GB18030 and GB2312 character-set encodings all have corresponding
character encodings in Unicode [4.0? 5.0?]?

2. I am asking because XML is specified in terms of Unicode no matter what
the encoding parameter is.  I understand one might want to say
encoding="GB2312" to ensure that text is confined to the characters and
encodings of that specfication to be useful in entry, display, printing and
processing outside of the ODF package.   Having a reliable "standard"
mapping to Unicode is valuable, if available.  (It also matters what version
of XML 1.0 we specify as normative for ODF, in terms of what can appear in
special types, such as xml:id, NCNAMEs, etc.)

3. How do you see this impacting use of IRIs and "full-path" names of Zip
items?  Can the "full-path" be carried in UTF-8 even though the coded
characters are meant to be limited to those of GB2312 or GB18030?  Likewise,
would you expect that manifest.xml could have encoding="GB2312" (for
example)?

4. Are GB2312 and GB18030 what are known as double-byte encodings?  Is it
possible to detect when an XML file is in such an encoding in order to
correctly process the XML prologue (so the encoding parameter can be
detected and read)?  (Put differently, is ISO 646 [a.k.a. 7-bit ASCII] a
subset of the GB encodings so the XML prologue is readable correctly so long
as non-646 characters do not appear?)

5. Finally, are there useful English-language descriptions or translations
of the GB2312 and GB18030 standards that you can refer us to on-line?

I am demonstrating my ignorance of these matters.  Your further guidance
will be valuable and very welcome.

 - Dennis

Dennis E. Hamilton
------------------
NuovoDoc: Design for Document System Interoperability 
mailto:Dennis.Hamilton@acm.org | gsm:+1-206.779.9430 
http://NuovoDoc.com http://ODMA.info/dev/ http://nfoWorks.org 


-----Original Message-----
From: Ming Fei Jia [mailto:jiamingf@cn.ibm.com] 
http://lists.oasis-open.org/archives/oic/200902/msg00025.html
Sent: Sunday, February 15, 2009 00:30
To: Hanssens Bart
Cc: oic@lists.oasis-open.org
Subject: Re: [oic] interoperability and non-latin scripts

[ ... ]

As to special interoperability issues from non-latin scripts, what I can see
now is listed here, which current ODF major products only support some of
them, for members' information:

(1)Character encoding issues. For example, although many Chinese documents
use Unicode, but still many of Chinese documents are encoded by China
national standard GB18030 or the older national standard GB2312. This needs
ODF applications to support these special encodings so that can show Chinese
documents correctly.

[ ... ]



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]