office-formula message

Subject: Re: [office-formula] CODE and CHAR should not be Unicode aware,proposing UNICODE and UNICHAR

From: Andreas J Guelzow <aguelzow@math.concordia.ab.ca>
To: OASIS ODFF SC <office-formula@lists.oasis-open.org>
Date: Thu, 15 Feb 2007 08:15:53 -0700

On Thu, 2007-15-02 at 13:47 +0100, Eike Rathke wrote:
> Hi,
> 
> CODE and CHAR currently are defined to handle Unicode. For
> interoperability (sigh) reasons I don't think that is a good idea. Ecma
> doesn't say anything about it, but Excel versions up to Excel 2003
> handle those functions differently, depending even on the system where
> the document originated: for documents created on Windows it uses the
> Windows-1252 ANSI code page, and for documents created on a Mac it uses
> a Mac code page, would have to lookup in the Excel online-help which one
> it was exactly. Unicode is not supported. Don't know what Excel 2007
> does though. Anyone?
> 
> Furthermore, the Korean (and maybe Japanese, others?) localized Excel
> versions seem (!) to support Unicode with these functions, but instead
> the CODE function delivers the collation point of a syllable character,
> and not the Unicode value, which I consider sophisticated nonsense. When
> loaded into an English Excel version the functionality is lost and code
> 63 for question mark is the result instead, since Korean characters
> aren't present in cp1252. When stored with an English Excel and loaded
> in a Korean version again the result is still broken (stored value
> displayed) unless the formula is recalculated.
> 
> It seems right to restrict CODE and CHAR to a code page, though I don't
> see a way to include the Windows/Mac differentiation in ODF. If we
> define the Windows-1252 code page, Mac documents imported will output
> garbage for those functions. An application can handle this when
> importing an Excel document, but the information will be lost once
> stored as ODF. Additionally, using a code page not matching the current
> system's encoding may also lead to garbage with user input, so
> applications may tend to use the current encoding instead, or you'd need
> to map things twice. Gnumeric uses cp1252, Kspread Unicode, OOo the
> system encoding. Taking this all together makes CODE and CHAR highly
> unportable. I propose to define cp1252 to be used with these functions.
> Opinions?

seems to be reasonable.
> 
> For a clean Unicode environment and portable documents I propose to add
> two new functions UNICODE and UNICHAR. Objections?

Please note that Gnumeric already has a unicode and a unichar function.

Andreas

-- 
"Liberty consists less in acting according to
one's own pleasure, than in not being subject 
to the will and pleasure of other people. It 
consists also in our not subjecting the wills 
of other people to our own."  Rousseau


Prof. Dr. Andreas J. Guelzow
Dept. of Mathematical & Computing Sciences
Concordia University College of Alberta

Follow-Ups:
- Re: [office-formula] CODE and CHAR should not be Unicode aware,proposing UNICODE and UNICHAR
  - From: Eike Rathke <erack@sun.com>

References:
- CODE and CHAR should not be Unicode aware,proposing UNICODE and UNICHAR
  - From: Eike Rathke <erack@sun.com>