[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

*Subject*: **[office-formula] problems with some functions, most statistical**

*From*:**Regina Henschel <rb.henschel@t-online.de>***To*: office-comment@lists.oasis-open.org*Date*: Sat, 08 Nov 2008 21:27:20 +0100

Dear members, working on the statistical functions in OpenOffice.org I have come across some problems in the draft specification http://www.oasis-open.org/committees/download.php/29629/OpenDocument-formula-20081010.odt Some of them are only editorial remarks, but others are really essential. It would be nice if a mathematical expert in your group could have a look at the problems and gave me feedback. Best regards Regina Henschel == 6.17.7. BETADIST == I miss the factor 1/(b-a) in the density case, see http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm == 6.15.33 GAMMA == There is the constrain N>=0 set, but GAMMA(0) is not defined. Therefore it should be at least N>0. But usually the Gamma-function is defined for all real numbers with exception of zero and negative integers. So the constraint should be: N<>0 and N not a negative integer. == 6.15.35 GCD == "Return the largest value that can be evenly divided (no remainder) into the given numbers." should be "Return the largest value that can divide the given numbers evenly (no remainder)." == 6.17.37 INTERCEPT == Only some editorial remarks to the text "Calculates the point at which a line will intersect the y-values by using known x-values and y-values." 1. It calculates no point but a single value. 2. The line does not intersect the y-values but the y-axis. 3. It does not say, that the line is the linear regression of the known values. == 6.17.53 LEGACY.NORMSDIST == The text "This is exactly NORMDIST(x)." should be "This is exactly NORMDIST(x;0;1)." because mean parameter and standard deviation parameter are not optional. == 6.17.60 PHI == The description is not as clear as possible. PHI is the density function of the standard normal distribution, therefore PHI(x)=NORMDIST(x;0;1;FALSE()). == 6.17.63 QUARTILE == The example calculates with Quart=1.5 although there is a constraint INT(Quart)=Quart == 6.17.64 RANK == The description "If not 0, Data is ranked in descending order." has to be "If not 0, Data is ranked in ascending order." == 6.17.71 STDEVA == The text "cells with text are converted to 0;" conflicts with "The handling of strings is implementation defined." == 6.17.76 TDIST == The formula shows the density function, but TDIST actually calculates the cumulative distribution function in OOo, Excel and Gnumeric. == 6.17.79 TRIMMEAN == The limits in the definition formula are wrong. Either count the data from 0, then sum from i=cutOff to n−1−cutOff; or count the data from 1, then sum from i=cutOff+1 to n−cutOff. == 6.17.80 TTEST == A lot of problems :( (1) All formulas miss the degrees of freedom. It is necessary for TDIST. In case unequal variance there must be defined, whether the degrees of freedom are made integer by FLOOR or whether they remain real. (2) In case paired: sqrt {n-1} is wrong because it is already contained in s_{X_1 - X_2}. (3) In case equal variance: definition of s² is missing. (4) In case unequal variance: definition does not meet usual definitions. In fact it is the definition for the equal variance case, with pooled variance. (5) As TDIST actually returns the values of the cumulative distribution function, the definite integral in the formulas has to be revised together with TDIST. == 6.17.85 WEIBULL == Alpha and Beta are exchanged in the formula, compared with the current implementation in OOo, Excel and Gnumeric. == 6.17.86 ZTEST == The draft specification defines the result as 1-P(-|z| <= Z <= |z|), which is a two-tailed form. But the current implementations use a one-tailed form. Besides a real bug in OOo, Excel and OOo return 1-P( Z<=z) for z>=0 and P( Z <= |z|) for z<0 Gnumeric returns P( Z <= |z|) for z<0 too, but I do not understand the result in Gnumeric for z>=0. In other cases of changed semantic, for example the change from right tail to left tail in CHIDIST, for the old version LEGACY was introduced. Shouldn't there be a differentiation in the case of ZTEST too? I know the specification is only about the internal names and using a prefixed name for the old version internally is possible. But the names in the specification give a strong hint how the functions should be named in the UI of the applications and unique names in different applications would make working with different applications much easier for users.

**Follow-Ups**:**Re: [office-comment] [office-formula] problems with some functions, moststatistical***From:*robert_weir@us.ibm.com

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]