office-comment message

Subject: Re: [office-comment] Text in OpenFormula - inadequate forinternational use

From: "David A. Wheeler" <dwheeler@dwheeler.com>
To: alexb@griffinbrown.co.uk
Date: Wed, 05 May 2010 15:09:00 -0400 (EDT)

Alex Brown:
> If so, I think it is unacceptable for an international standard (any standard, frankly) not to have basic support for international text required as a basic required provision.

The specification is specifically written to *permit* support of arbitrary Unicode/10646 text at run-time; if there is an example that *inhibits* this, it's an error, and we need to fix it.  Please let us know of any.  I don't know of any (your citation permits it, for example).

The primary purpose of all spreadsheet formulas is to calculate *numbers*, not *text*.  *No* spreadsheet implementation is good at processing text, because that's not what they're intended for.  Often, the rare text processing at all is stuff like "show the number of * as shown in this other cell", which does not depend on international characters.  For example, that there isn't even an iterator defined in the language, so a lot of text processing simply *can't* be done by spreadsheet formulas.

Which means that I read your comment as, "this language is not very good at tasks it's not intended for".  Which is true, but that is true for all things made by mankind.

It would be trivial to change the *specification* to require Unicode/ISO 10646 support; just change a few "should"s to "shalls".  But that would not suddenly make *implementations* support it, esp. beyond the BMP.  Since it's intended primarily for numerical calculation, the few text processing functions are there for various trivial and historical purposes.  Producing a specification that is *not* implemented is a sham and a waste of everyone's time, and we really want to *avoid* that.

There's an obvious compromise position, thankfully.  We could mandate Unicode/10646 support, at least for the BMP, in the "medium" group conformance clause.  That way, tiny implementations that implement a small set of functions could still meet *something*, and there'd be an obvious growth path.  I think that's the better approach, if it is to be mandated somewhere.

> A thorough pass should be made of the text to remove references to ASCII text (except for legacy purposes) and rebase text representation and handling on Unicode.

The "limits" section requires a minimum number of characters, and I think the best way to make that sensible is to specify the minimum number of ASCII characters in a text (string) type.  Otherwise, the limits in practice would depend on the encoding (e.g., UTF-8 vs. UTF-16, are characters beyond BMP allowed, etc.), and they'd be hard for users to understand.

--- David A. Wheeler

Follow-Ups:
- Re: [office-comment] Text in OpenFormula - inadequate for international use
  - From: marbux <marbux@gmail.com>
- RE: [office-comment] Text in OpenFormula - inadequate forinternational use
  - From: Alex Brown <alexb@griffinbrown.co.uk>
- Re: [office-comment] Text in OpenFormula - inadequate forinternational use
  - From: "Leonard Mada" <discoleo@gmx.net>

References:
- Text in OpenFormula - inadequate for international use
  - From: Alex Brown <alexb@griffinbrown.co.uk>