office-formula message

Subject: Re: [office-formula] Summary of 2009-10-13 teleconference

From: Eike Rathke <erack@sun.com>
To: office-formula@lists.oasis-open.org
Date: Wed, 14 Oct 2009 14:08:18 +0200

Hi,

On Tuesday, 2009-10-13 11:39:08 -0400, David A. Wheeler wrote:

> * The "Text" (String) type will simply defined as something that can contain 0 or more "characters".  We will separately discuss what "character" means.  I personally think we should recommend, but not require, that implementations support a character set and encoding that permit any legal Unicode code point be a character, but we didn't discuss that today.

I think we should say that implementations
- shall support Unicode BMP and
- should support the entire Unicode character range.

This comforts those that internally use UCS2 encoding only, though
I don't know if there are implementations limited such. I don't think
there would be implementations of OpenFormula that do not support
Unicode BMP.

> * We will need to add a discussion noting that implementations may have a specific character set and character encoding as a setting, and that this may limit which characters may be included in strings.

I don't see how that would affect the specification other than we could
note that some implementations are limited and thus results may differ
if an ODF/OpenFormula document is read by such. I don't think that is
the responsibility of the specification though.

> Do we need to have a way to STORE this information in an OpenDocument file? If so, how?

I don't think so. Strings are stored in the encoding given by <?xml encoding="...">
I don't see much benefit in storing the internal encoding of the
generating implementation, other than readers would be required to
convert from their internal Unicode encoding to that other encoding for
functions such as CODE() and CHAR(). Doing so would impose a bunch of
otherwise unnecessary conversion routines on implementations, maybe even
including encodings not registered with IANA. However, we define those
functions to ASCII for values 1<=N<=127 and to be
implementation-dependent for values 128<=N<255 anyway.

> * CHAR() and CODE() are *not* deprecated... they stay in.  Instead, they "normalize" to ASCII values, regardless of the internal representation.  I believe this only affects those who use a particular Arabic encoding that uses 0...127 for non-ASCII characters, and only when using those functions.

We still define 128..255 to be implementaion-dependent, yes?

  Eike

-- 
Automatic string conversions considered dangerous. They are the GOTO statements
of spreadsheets.  --Robert Weir on the OpenDocument formula subcommittee's list.

PGP signature

References:
- Summary of 2009-10-13 teleconference
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>