OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cgmo-webcgm message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [cgmo-webcgm] UTF-8 & UTF-16 sequences (ISSUEs)


Issue 1: Alt 1, no
Issue 2: Alt 3, deprecate and change to new form

> -----Original Message-----
> From: Lofton Henderson [mailto:lofton@rockynet.com]
> Sent: Friday, June 24, 2005 1:29 AM
> To: cgmo-webcgm@lists.oasis-open.org
> Subject: [cgmo-webcgm] UTF-8 & UTF-16 sequences (ISSUEs)
>
>
> WebCGM TC,
>
> I have an action item to research "UTF-x sequence tails".  Thanks to
> Forrest for providing me some references and some motivation, I
> have gotten
> the information, and I make recommendations below.
>
> [1] http://www.unihan.com.cn/Cjk/ana18.htm
> [2] http://www.unihan.com.cn/Cjk/ana19.htm
>
> At [1] and [2], we find the ISO/IEC 2022 escape sequences:
>
> UTF-8 implementation level 3:  ESC 2/5 2/15 4/9
> UTF-16  implementation level 3:  ESC 2/5 2/15 4/12
>
> At [3], I found a lucid explanation of this stuff, and particularly what
> "implementation level 1,2,3" mean.  In the past, we chose implementation
> level 3 (whether or not it was a well-considered decision is
> another question).
>
> [3] http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html#3
>
> Separate the cases of non-graphical text (SF) and graphical text (S) in
> WebCGM 1.0.
>
> Non-graphical text (SF):  T.14.5
> -----
> T.14.5 says that the metafile id (BEGIN METAFILE parameter, type
> SF) shall
> have as its first 4 octets the 4-octet sequences above, to
> declare for the
> whole metafile that SF is UTF-8 or UTF-16.
>
> Conclusion:  no problem here.
>
> Graphical text (S):  T.16.14
> -----
> T.16.14 takes the last character of the above 4-octet sequences as the
> 'tail', for use in the CHARACTER SET LIST (CSL) element.  So the two-part
> data for CSL are specified as:
>
> UTF-8 implementation level 3:  'complete code', 4/9
> UTF-16  implementation level 3:  'complete code', 4/12
>
> This was based on information in CGM:1999 section 6.3.4.3, that
> characterizes the escape sequences for complete codes as:  ESC 2/5 I* F.
> I* is zero or more "intermediate characters", and F is a single final
> character.  WebCGM 1.0 took only F for the tail.  But CGM:1999 says:
>
> >The character set declaration ... consists of 'complete code'
> followed by
> >a string consisting of those characters in the code's ISO 2022 escape
> >sequence which come after the first two characters, ESC  2/5.
>
> Conclusion:  WebCGM 1.0 is wrong for the CSL tails for UTF-8 and
> UTF-16.  The CSL data should be:
>
> UTF-8 implementation level 3:  'complete code', 2/15 4/9
> UTF-16  implementation level 3:  'complete code', 2/15 4/12
>
> ISSUES:
> ===
> ISSUE 1:  Should we issue an erratum for WebCGM 1.0?
>
> Alternatives:
> Alt.1:  No
> Alt.2:  Yes
>
> Recommendation for Issue 1:  Alt.1, No.
>
> Discussion:  CGM:1999 CSL is not really an implementation of ISO
> 2022, but
> rather takes concepts and bits of escape sequences as parameters for the
> CSL to designate character sets.  WebCGM 1.0 lists data that is
> to be used
> to designate 6 character sets.  Though wrong according to ISO2022, on the
> other hand these are effectively just tokens to select the 6 char. sets,
> and it is unambiguous in the context of WebCGM.  To change WebCGM 1.0 by
> erratum will invalidate existing WebCGM 1.0 products in the
> field, for new
> WebCGM 1.0 content.  And would cause existing "valid" 1.0 content
> to become
> invalid.  It's not worth it, IMO.
>
> ISSUE 2:  Should we correct it for WebCGM 2.0?
>
> Alternatives:
> Alt.1:  No
> Alt.2:  Yes
> Alt.3:  Yes, but do it by deprecation of the old 1.0 forms (2.0
> generators
> shall generate only the 2.0 forms, 2.0 viewers shall accept 1.0 forms as
> well as 2.0 forms)
>
> Recommendation for Issue 2:  Alt.3, Yes, but by deprecation of old.
>
> Discussion:  If generators are writing 2.0 files, and they put out the
> proper forms, then there really shouldn't be a problem with old (1.0)
> viewers in the field -- they won't understand other 2.0 stuff
> anyway.  2.0
> generators and 2.0 viewers will be using "correct" forms.
>
> Thoughts?
>
> -Lofton.
>
>
>



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]