[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: [cgmo-webcgm] UTF-8 & UTF-16 sequences (ISSUEs)
Issue 1: Alt 1, no Issue 2: Alt 3, deprecate and change to new form > -----Original Message----- > From: Lofton Henderson [mailto:lofton@rockynet.com] > Sent: Friday, June 24, 2005 1:29 AM > To: cgmo-webcgm@lists.oasis-open.org > Subject: [cgmo-webcgm] UTF-8 & UTF-16 sequences (ISSUEs) > > > WebCGM TC, > > I have an action item to research "UTF-x sequence tails". Thanks to > Forrest for providing me some references and some motivation, I > have gotten > the information, and I make recommendations below. > > [1] http://www.unihan.com.cn/Cjk/ana18.htm > [2] http://www.unihan.com.cn/Cjk/ana19.htm > > At [1] and [2], we find the ISO/IEC 2022 escape sequences: > > UTF-8 implementation level 3: ESC 2/5 2/15 4/9 > UTF-16 implementation level 3: ESC 2/5 2/15 4/12 > > At [3], I found a lucid explanation of this stuff, and particularly what > "implementation level 1,2,3" mean. In the past, we chose implementation > level 3 (whether or not it was a well-considered decision is > another question). > > [3] http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html#3 > > Separate the cases of non-graphical text (SF) and graphical text (S) in > WebCGM 1.0. > > Non-graphical text (SF): T.14.5 > ----- > T.14.5 says that the metafile id (BEGIN METAFILE parameter, type > SF) shall > have as its first 4 octets the 4-octet sequences above, to > declare for the > whole metafile that SF is UTF-8 or UTF-16. > > Conclusion: no problem here. > > Graphical text (S): T.16.14 > ----- > T.16.14 takes the last character of the above 4-octet sequences as the > 'tail', for use in the CHARACTER SET LIST (CSL) element. So the two-part > data for CSL are specified as: > > UTF-8 implementation level 3: 'complete code', 4/9 > UTF-16 implementation level 3: 'complete code', 4/12 > > This was based on information in CGM:1999 section 6.3.4.3, that > characterizes the escape sequences for complete codes as: ESC 2/5 I* F. > I* is zero or more "intermediate characters", and F is a single final > character. WebCGM 1.0 took only F for the tail. But CGM:1999 says: > > >The character set declaration ... consists of 'complete code' > followed by > >a string consisting of those characters in the code's ISO 2022 escape > >sequence which come after the first two characters, ESC 2/5. > > Conclusion: WebCGM 1.0 is wrong for the CSL tails for UTF-8 and > UTF-16. The CSL data should be: > > UTF-8 implementation level 3: 'complete code', 2/15 4/9 > UTF-16 implementation level 3: 'complete code', 2/15 4/12 > > ISSUES: > === > ISSUE 1: Should we issue an erratum for WebCGM 1.0? > > Alternatives: > Alt.1: No > Alt.2: Yes > > Recommendation for Issue 1: Alt.1, No. > > Discussion: CGM:1999 CSL is not really an implementation of ISO > 2022, but > rather takes concepts and bits of escape sequences as parameters for the > CSL to designate character sets. WebCGM 1.0 lists data that is > to be used > to designate 6 character sets. Though wrong according to ISO2022, on the > other hand these are effectively just tokens to select the 6 char. sets, > and it is unambiguous in the context of WebCGM. To change WebCGM 1.0 by > erratum will invalidate existing WebCGM 1.0 products in the > field, for new > WebCGM 1.0 content. And would cause existing "valid" 1.0 content > to become > invalid. It's not worth it, IMO. > > ISSUE 2: Should we correct it for WebCGM 2.0? > > Alternatives: > Alt.1: No > Alt.2: Yes > Alt.3: Yes, but do it by deprecation of the old 1.0 forms (2.0 > generators > shall generate only the 2.0 forms, 2.0 viewers shall accept 1.0 forms as > well as 2.0 forms) > > Recommendation for Issue 2: Alt.3, Yes, but by deprecation of old. > > Discussion: If generators are writing 2.0 files, and they put out the > proper forms, then there really shouldn't be a problem with old (1.0) > viewers in the field -- they won't understand other 2.0 stuff > anyway. 2.0 > generators and 2.0 viewers will be using "correct" forms. > > Thoughts? > > -Lofton. > > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]