office-comment message

Subject: [office-formula] problems with some functions, most statistical
From: Regina Henschel <rb.henschel@t-online.de>
To: office-comment@lists.oasis-open.org
Date: Sat, 08 Nov 2008 21:27:20 +0100
Dear members,

working on the statistical functions in OpenOffice.org I have come 
across some problems in the draft specification 
http://www.oasis-open.org/committees/download.php/29629/OpenDocument-formula-20081010.odt
Some of them are only editorial remarks, but others are really essential.

It would be nice if a mathematical expert in your group could have a 
look at the problems and gave me feedback.

Best regards
Regina Henschel


== 6.17.7. BETADIST ==

I miss the factor 1/(b-a) in the density case, see 
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm


== 6.15.33 GAMMA ==

There is the constrain N>=0 set, but GAMMA(0) is not defined. Therefore 
it should be at least N>0. But usually the Gamma-function is defined for 
  all real numbers with exception of zero and negative integers. So the 
constraint should be: N<>0 and N not a negative integer.


== 6.15.35 GCD ==

"Return the largest value that can be evenly divided (no remainder) into 
the given numbers." should be "Return the largest value that can divide 
the given numbers evenly (no remainder)."


== 6.17.37 INTERCEPT ==

Only some editorial remarks to the text "Calculates the point at which a 
line will intersect the y-values by using known x-values and y-values."
1. It calculates no point but a single value. 2. The line does not 
intersect the y-values but the y-axis. 3. It does not say, that the line 
is the linear regression of the known values.


== 6.17.53 LEGACY.NORMSDIST ==

The text "This is exactly NORMDIST(x)." should be "This is exactly 
NORMDIST(x;0;1)." because mean parameter and standard deviation 
parameter are not optional.


== 6.17.60 PHI ==

The description is not as clear as possible. PHI is the density function 
of the standard normal distribution, therefore 
PHI(x)=NORMDIST(x;0;1;FALSE()).


== 6.17.63 QUARTILE ==

The example calculates with Quart=1.5 although there is a constraint 
INT(Quart)=Quart


== 6.17.64 RANK ==

The description "If not 0, Data is ranked in descending order." has to 
be "If not 0, Data is ranked in ascending order."


== 6.17.71 STDEVA ==

The text "cells with text are converted to 0;" conflicts with "The 
handling of strings is implementation defined."


== 6.17.76 TDIST ==

The formula shows the density function, but TDIST actually calculates 
the cumulative distribution function in OOo, Excel and Gnumeric.


== 6.17.79 TRIMMEAN ==

The limits in the definition formula are wrong. Either count the data 
from 0, then sum from i=cutOff to n−1−cutOff; or count the data from 1, 
then sum from i=cutOff+1 to n−cutOff.


== 6.17.80 TTEST ==

A lot of problems :(
(1) All formulas miss the degrees of freedom. It is necessary for TDIST. 
In case unequal variance there must be defined, whether the degrees of 
freedom are made integer by FLOOR or whether they remain real.
(2) In case paired: sqrt {n-1} is wrong because it is already contained 
in s_{X_1 - X_2}.
(3) In case equal variance: definition of s² is missing.
(4) In case unequal variance: definition does not meet usual 
definitions. In fact it is the definition for the equal variance case, 
with pooled variance.
(5) As TDIST actually returns the values of the cumulative distribution 
function, the definite integral in the formulas has to be revised 
together with TDIST.


== 6.17.85 WEIBULL ==

Alpha and Beta are exchanged in the formula, compared with the current 
implementation in OOo, Excel and Gnumeric.


== 6.17.86 ZTEST ==

The draft specification defines the result as 1-P(-|z| <= Z <= |z|), 
which is a two-tailed form. But the current implementations use a 
one-tailed form. Besides a real bug in OOo, Excel and OOo return
1-P( Z<=z) for z>=0 and
P( Z <= |z|) for z<0
Gnumeric returns P( Z <= |z|) for z<0 too, but I do not understand the 
result in Gnumeric for z>=0.

In other cases of changed semantic, for example the change from right 
tail to left tail in CHIDIST, for the old version LEGACY was introduced. 
Shouldn't there be a differentiation in the case of ZTEST too? I know 
the specification is only about the internal names and using a prefixed 
name for the old version internally is possible. But the names in the 
specification give a strong hint how the functions should be named in 
the UI of the applications and unique names in different applications 
would make working with different applications much easier for users.
Follow-Ups:
- Re: [office-comment] [office-formula] problems with some functions, moststatistical
  - From: robert_weir@us.ibm.com