[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: RE: WebCGM and UTF16
I also agree's with this approach. Kevin O'Kane Auto-trol Technology Corp. Denver CO kevoka@auto-trol.com www.auto-trol.com 303-252-2821 > -----Original Message----- > From: Dieter@isodraw.de [SMTP:Dieter@isodraw.de] > Sent: Thursday, August 16, 2001 10:39 AM > To: Lofton Henderson; cgmopen-members@lists.oasis-open.org > Subject: Re: WebCGM and UTF16 > > All, > > I support this approach. We changed both IsoDraw 5 and IsoView 3 to write > UTF-16 as described below. > Any older files that may have been written in little-endian byte order can > be read by IsoDraw and saved again as big-endian. > > Dieter Weidenbrück > ITEDO Software GmbH > > ----- Original Message ----- > From: Lofton Henderson <mailto:lofton@rockynet.com> > To: cgmopen-members@lists.oasis-open.org > <mailto:cgmopen-members@lists.oasis-open.org> > Sent: Wednesday, August 15, 2001 4:55 PM > Subject: WebCGM and UTF16 > > CGM Open Members, > > Recently, a question came up about the use of Unicode UTF16 in > WebCGM instances. The byte order of the two-byte codes of UTF16 is not > unambiguously specified by the Unicode standard. For example, to > represent the 6 character ASCII string "WebCGM" in UTF16, the same 7-bit > ASCII codes are used for one byte of the UTF16 representation, and the > other byte is zero (this is true of 8-bit ISOLatin1 also, not just the LHS > ASCII subset). So, would the data stream in a WebCGM instance be the > 12-byte sequence: > > Option a): 0 W 0 e 0 b 0 C 0 G 0 M > > or is it: > > Option b): W 0 e 0 b 0 C 0 G 0 M 0 > > This issue is discussed in section 2.7 of Unicode (see > <http://www.unicode.org/unicode/uni2book/ch02.pdf).%A0>pdf ). > <http://www.unicode.org/unicode/uni2book/ch02.pdf).%A0>An optional (not > required) BOM (byte order marker) is defined, for use in circumstances > where the order might otherwise be ambiguous. > > Here is the ambiguity with regard to WebCGM parameters of type SF > (non-graphical string) or S (graphical string) -- is the BOM: > > 1. prohibited? > 2. or, required? > 3. or, allowed but not required? > > Implicit in #1 is that a single standard order is mandated for all > UTF16 strings in all WebCGM instances. There are all sorts of flavors and > questions associated with #2 and #3: what is the default (if #3); does > the BOM (0xFEFF or 0xFFFE) have to occur in every string instance; ...? > > (Tutorial background. Recall that type SF strings are all of one > character set in a given WebCGM instance, and that type is IsoLatin1 by > default, and may be changed to UTF8 or UTF16 by a 4-character esc > [introducer] sequence at the start of the BegMF id string. Character sets > of type S strings may be switched within a WebCGM using the normal > Character Set List and (Alternate) Character Set Index mechanisms.) > > We think that #1 is the correct WebCGM interpretation. The CGM > binary encoding was specified with an unambiguous byte order, after > considerable discussion (mid-1980s) about the endian issue. If you view > the 16-bit UTF16 codes to be a CGM "word" (see section 5.3 of Part 3). > Then the correct representation of UTF16 codes in the WebCGM data stream > is "big endian". I.e., Option (a) above, i.e., > > 0 W 0 e 0 b 0 C 0 G 0 M > > This interpretation has been agreed by the one implementation I know > of that can generate UTF16. > > Does anyone disagree with this interpretation and clarification? > > Regards, > Lofton. > > > > > ******************* > Lofton Henderson > 1919 Fourteenth St., #604 > Boulder, CO 80302 > > Phone: 303-449-8728 > Email: lofton@rockynet.com > ******************* >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC