OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cgmopen-members message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Subject: RE: WebCGM and UTF16


I also agree's with this approach.

Kevin O'Kane
Auto-trol Technology Corp.
Denver CO
kevoka@auto-trol.com
www.auto-trol.com
303-252-2821


> -----Original Message-----
> From:	Dieter@isodraw.de [SMTP:Dieter@isodraw.de]
> Sent:	Thursday, August 16, 2001 10:39 AM
> To:	Lofton Henderson; cgmopen-members@lists.oasis-open.org
> Subject:	Re: WebCGM and UTF16
> 
> All,
>  
> I support this approach. We changed both IsoDraw 5 and IsoView 3 to write
> UTF-16 as described below. 
> Any older files that may have been written in little-endian byte order can
> be read by IsoDraw and saved again as big-endian.
>  
> Dieter Weidenbrück
> ITEDO Software GmbH
> 
> 	----- Original Message ----- 
> 	From: Lofton Henderson <mailto:lofton@rockynet.com> 
> 	To: cgmopen-members@lists.oasis-open.org
> <mailto:cgmopen-members@lists.oasis-open.org> 
> 	Sent: Wednesday, August 15, 2001 4:55 PM
> 	Subject: WebCGM and UTF16
> 
> 	CGM Open Members,
> 	
> 	Recently, a question came up about the use of Unicode UTF16 in
> WebCGM instances.  The byte order of the two-byte codes of UTF16 is not
> unambiguously specified by the Unicode standard.  For example, to
> represent the 6 character ASCII string "WebCGM" in UTF16, the same 7-bit
> ASCII codes are used for one byte of the UTF16 representation, and the
> other byte is zero (this is true of 8-bit ISOLatin1 also, not just the LHS
> ASCII subset).  So, would the data stream in a WebCGM instance be the
> 12-byte sequence:
> 	
> 	Option a):  0 W 0 e 0 b 0 C 0 G 0 M
> 	
> 	or is it:
> 	
> 	Option b):  W 0 e 0 b 0 C 0 G 0 M 0
> 	
> 	This issue is discussed in section 2.7 of Unicode (see
> <http://www.unicode.org/unicode/uni2book/ch02.pdf).%A0>pdf ).
> <http://www.unicode.org/unicode/uni2book/ch02.pdf).%A0>An optional (not
> required) BOM (byte order marker) is defined, for use in circumstances
> where the order might otherwise be ambiguous.
> 	
> 	Here is the ambiguity with regard to WebCGM parameters of type SF
> (non-graphical string) or S (graphical string) -- is the BOM:
> 	
> 	1. prohibited?
> 	2. or, required?
> 	3. or, allowed but not required?
> 	
> 	Implicit in #1 is that a single standard order is mandated for all
> UTF16 strings in all WebCGM instances.  There are all sorts of flavors and
> questions associated with #2 and #3:  what is the default (if #3); does
> the BOM (0xFEFF or 0xFFFE) have to occur in every string instance; ...? 
> 	
> 	(Tutorial background.  Recall that type SF strings are all of one
> character set in a given WebCGM instance, and that type is IsoLatin1 by
> default, and may be changed to UTF8 or UTF16 by a 4-character esc
> [introducer] sequence at the start of the BegMF id string.  Character sets
> of type S strings may be switched within a WebCGM using the normal
> Character Set List and (Alternate) Character Set Index mechanisms.)
> 	
> 	We think that #1 is the correct WebCGM interpretation.  The CGM
> binary encoding was specified with an unambiguous byte order, after
> considerable discussion (mid-1980s) about the endian issue.  If you view
> the 16-bit UTF16 codes to be a CGM "word" (see section 5.3 of Part 3).
> Then the correct representation of UTF16 codes in the WebCGM data stream
> is "big endian".  I.e., Option (a) above, i.e., 
> 	
> 	0 W 0 e 0 b 0 C 0 G 0 M
> 	
> 	This interpretation has been agreed by the one implementation I know
> of that can generate UTF16.
> 	
> 	Does anyone disagree with this interpretation and clarification?
> 	
> 	Regards,
> 	Lofton.
> 	
> 	
> 	
> 	
> 	*******************
> 	Lofton Henderson
> 	1919 Fourteenth St., #604
> 	Boulder, CO   80302
> 
> 	Phone:  303-449-8728
> 	Email:  lofton@rockynet.com
> 	******************* 
> 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


Powered by eList eXpress LLC