[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [cgmo-webcgm] UTF-8 & UTF-16 sequences (ISSUEs)
Issue 1: For completeness, I lean towards Alt 2, since there are some other errata that needs to be created for WebCGM 1.0 that we haven't done yet, but I'm not hard over on it. Issue 2: Alt 3 - deprecate and correct thx...Dave -----Original Message----- From: Lofton Henderson [mailto:lofton@rockynet.com] Sent: Thursday, June 23, 2005 4:29 PM To: cgmo-webcgm@lists.oasis-open.org Subject: [cgmo-webcgm] UTF-8 & UTF-16 sequences (ISSUEs) WebCGM TC, I have an action item to research "UTF-x sequence tails". Thanks to Forrest for providing me some references and some motivation, I have gotten the information, and I make recommendations below. [1] http://www.unihan.com.cn/Cjk/ana18.htm [2] http://www.unihan.com.cn/Cjk/ana19.htm At [1] and [2], we find the ISO/IEC 2022 escape sequences: UTF-8 implementation level 3: ESC 2/5 2/15 4/9 UTF-16 implementation level 3: ESC 2/5 2/15 4/12 At [3], I found a lucid explanation of this stuff, and particularly what "implementation level 1,2,3" mean. In the past, we chose implementation level 3 (whether or not it was a well-considered decision is another question). [3] http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html#3 Separate the cases of non-graphical text (SF) and graphical text (S) in WebCGM 1.0. Non-graphical text (SF): T.14.5 ----- T.14.5 says that the metafile id (BEGIN METAFILE parameter, type SF) shall have as its first 4 octets the 4-octet sequences above, to declare for the whole metafile that SF is UTF-8 or UTF-16. Conclusion: no problem here. Graphical text (S): T.16.14 ----- T.16.14 takes the last character of the above 4-octet sequences as the 'tail', for use in the CHARACTER SET LIST (CSL) element. So the two-part data for CSL are specified as: UTF-8 implementation level 3: 'complete code', 4/9 UTF-16 implementation level 3: 'complete code', 4/12 This was based on information in CGM:1999 section 6.3.4.3, that characterizes the escape sequences for complete codes as: ESC 2/5 I* F. I* is zero or more "intermediate characters", and F is a single final character. WebCGM 1.0 took only F for the tail. But CGM:1999 says: >The character set declaration ... consists of 'complete code' followed by >a string consisting of those characters in the code's ISO 2022 escape >sequence which come after the first two characters, ESC 2/5. Conclusion: WebCGM 1.0 is wrong for the CSL tails for UTF-8 and UTF-16. The CSL data should be: UTF-8 implementation level 3: 'complete code', 2/15 4/9 UTF-16 implementation level 3: 'complete code', 2/15 4/12 ISSUES: === ISSUE 1: Should we issue an erratum for WebCGM 1.0? Alternatives: Alt.1: No Alt.2: Yes Recommendation for Issue 1: Alt.1, No. Discussion: CGM:1999 CSL is not really an implementation of ISO 2022, but rather takes concepts and bits of escape sequences as parameters for the CSL to designate character sets. WebCGM 1.0 lists data that is to be used to designate 6 character sets. Though wrong according to ISO2022, on the other hand these are effectively just tokens to select the 6 char. sets, and it is unambiguous in the context of WebCGM. To change WebCGM 1.0 by erratum will invalidate existing WebCGM 1.0 products in the field, for new WebCGM 1.0 content. And would cause existing "valid" 1.0 content to become invalid. It's not worth it, IMO. ISSUE 2: Should we correct it for WebCGM 2.0? Alternatives: Alt.1: No Alt.2: Yes Alt.3: Yes, but do it by deprecation of the old 1.0 forms (2.0 generators shall generate only the 2.0 forms, 2.0 viewers shall accept 1.0 forms as well as 2.0 forms) Recommendation for Issue 2: Alt.3, Yes, but by deprecation of old. Discussion: If generators are writing 2.0 files, and they put out the proper forms, then there really shouldn't be a problem with old (1.0) viewers in the field -- they won't understand other 2.0 stuff anyway. 2.0 generators and 2.0 viewers will be using "correct" forms. Thoughts? -Lofton.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]