office-comment message

Subject: Re: [office] Next set of public comments for review (#18-#35)

From: Leonard Mada <discoleo@gmx.net>
To: office-comment@lists.oasis-open.org
Date: Tue, 03 Jun 2008 21:57:31 +0300

Dear list memebrs,

this is a comment to one of the threads on the restricted 
office@lists.oasis-open.org list.

I believe the discussion is too important and the current implementation 
is too dangerous to keep the current state of affairs.

> I believe #24 is done. This involved handling references to empty 
> cells; new text in section 4 (Types) clarifies what happens in these 
> cases.
>
> Basically, there isn't a special type called the empty type.

This goes strongly against ALL concepts of professional statistical 
applications, where such values are given a special type/identifier, 
namely these are "missing values", or shortly NA. (e.g. R, 
http://cran.R-project.org, and others)

> Instead, there is a type "Reference", which _may_ refer to empty cells 
> or ranges that include empty cells. What happens to empty cells 
> depends on the required types and implicit conversion rules, as 
> defined in section 4.

I hate implicit conversion rules. Namely, the conversion to (int/float) 
0 is mostly unneeded and unwanted.

There are basically 2 useful approaches:
A.) ignore the cell completely
-- without raising any error
-- without interpreting the cell as 0 or empty string
-- should be probably the default

B.) Interpret as an NA-Error and propagate the error downstream
-- probably only in a minority of cases
[e.g. debugging spreadsheets]

As I (probably) mentioned in some previous message [I can't find it, 
though], R has a more advanced mechanism to handle missing values, and I 
would welcome any improvement in this area regarding ODF.

IF a value is missing, one can pass additional arguments to R-functions, 
which specify how to handle those missing values.

To quote from R:
[http://cran.R-project.org]

> "na.action: is a function which indicates what should happen when the 
> data contain NAs. The default is set by the na.action setting of 
> options, and is na.fail if that is unset. The ‘factory-fresh’ default 
> is na.omit. Another possible value is NULL, no action. Value 
> na.exclude can be useful.

So, the default in R is *na.omit* and this is what spreadsheets should 
be doing, too. Omit the whole cell, don't include it in any 
calculations, do NOT even evaluate the empty cell! Definitely, do not 
evaluate it as "0"!

I believe this is a far more transparent and flexible alternative.

Sincerely,

Leonard

> E.G., if you use a function that requires "Number", but instead is 
> given a reference to an empty cell, then the reference is converted to 
> the number 0. If the function requires a NumberSequence, all the empty 
> cells are ignored when creating a NumberSequence (and just like 
> strings, NumberSequences can be 0 length).
>
> --- David A. Wheeler

Follow-Ups:
- Re: [office-comment] Re: [office] Next set of public comments forreview (#18-#35)
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>