OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

oic message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [oic] interoperability and non-latin scripts


Dennis,

I clarify firstly that I mentioned the Chinese encoding issues means we should add the testing for non-latin encoding documents. Actually those encodings are compatible with Unicode. If ODF allows non Unicode encoding as well as ODF applications support these encodings, there should be no issues.

> From:
>
> "Dennis E. Hamilton" <dennis.hamilton@acm.org>


> Subject:

>
> RE: [oic] interoperability and non-latin scripts

>
> Mingfei Jia(¼ÖÃ÷·É),
>
> That is a great list.  
>
> For Character encoding issues, I have some questions.
>
> 1. Do GB18030 and GB2312 character-set encodings all have corresponding
> character encodings in Unicode [4.0? 5.0?]?

Yes. GB18030 has the corresponding character encodings in Unicode 5.0. GB2312 is a subset of GB18030.
>
> 2. I am asking because XML is specified in terms of Unicode no matter what
> the encoding parameter is.  I understand one might want to say
> encoding="GB2312" to ensure that text is confined to the characters and
> encodings of that specfication to be useful in entry, display, printing and
> processing outside of the ODF package.   Having a reliable "standard"
> mapping to Unicode is valuable, if available.  (It also matters what version
> of XML 1.0 we specify as normative for ODF, in terms of what can appear in
> special types, such as xml:id, NCNAMEs, etc.)
>
> 3. How do you see this impacting use of IRIs and "full-path" names of Zip
> items?  Can the "full-path" be carried in UTF-8 even though the coded
> characters are meant to be limited to those of GB2312 or GB18030?  Likewise,
> would you expect that manifest.xml could have encoding="GB2312" (for
> example)?

You mean a mixed encoding in a xml file, "full-path" is encoded in UTF-8, and the other text is encoded by GB2312 or GB18030. I did not verify that case, even it works, I do not prefer it either. I think generally the full text is encoded by one kind of encoding. If IRI is encoded by non Unicode, as well as ODF application supports that encoding,it should works. Of course, OS also need to support that encoding.
>
> 4. Are GB2312 and GB18030 what are known as double-byte encodings?  Is it
> possible to detect when an XML file is in such an encoding in order to
> correctly process the XML prologue (so the encoding parameter can be
> detected and read)?  (Put differently, is ISO 646 [a.k.a. 7-bit ASCII] a
> subset of the GB encodings so the XML prologue is readable correctly so long
> as non-646 characters do not appear?)

Yes, GB18030 and GB2312 are double-byte encodings. And it should be possible to detect the encodings, but depends on implementation. Many applicatioins can do this, e.g. IE,Firefox.
>
> 5. Finally, are there useful English-language descriptions or translations
> of the GB2312 and GB18030 standards that you can refer us to on-line?

you can refer to this link: http://en.wikipedia.org/wiki/GB_18030
>
> I am demonstrating my ignorance of these matters.  Your further guidance
> will be valuable and very welcome.
>
>  - Dennis
>
> Dennis E. Hamilton
> ------------------



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]