OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [office-comment] ODFF: list of suggestions


Hello everyone,

I waded through more statistical functions (OASIS Open Office 
Specification, 2007-12-28)

1.) BINOMDIST()
consider rewriting that section. Saying that it is not very informative 
is an understatement. I strongly suggest explaining it as a density, 
respectively as a distribution  function (and use consistently the 
dist-keyword).

2.) 6.17.23 FISHER transformation
Beyond the fact that the name is easily misinterpreted as the 
*gold-standard* test for count-data (the FISHER-Exact test), the formula 
is most certainly wrong. IF I read it correctly ((1-r)/(1-r), which 
would be 1), then this has to be wrong, and it should be (1+r)/(1-r).

The same applies to the FISHERINV function.

I would strongly suggest to change the names of these functions to 
something like TRANSFISHER() and TRANSFISHERINV(). You may want to 
implement later the Fisher-exact test and it makes sense to have this 
name for that test.

3.) LEGACY.CHITEST
> Note: Applications usually describe the CHITEST function as a 
> Chi-square independence test. From a mathematical point of view this 
> is not correct, as that would not involve testing some actual data 
> against a set of expected values. It resembles more a Goodness-for-Fit 
> test, but how the degree of freedom is calculated actually doesn't 
> make sense then. This is specified to be inter operable with Excel and 
> OpenOffice.org. Gnumeric gets different results if the number of rows 
> and columns both are greater than 2.

Well, I suggest comparing the results with the FISHER-EXACT test, e.g. in R.

Also, every statistical package (R, EPI INFO, SPSS, ...) do NOT need the 
expected values, as they compute them automatically from the n*m table. 
I wonder why spreadsheets do NOT do it automatically, as well. Most 
users simply fail to compute the correct value. Well, I try to teach 
them, but almost everyone will get it wrong a week later. [It is way 
easier to remember the shortcut for a 2x2 table, aka (ad-bc)^2 * N/(n1 * 
n2 * n3 * n4), where n1-4 are the 4 subtotals, then compute accurately 
the expected values.]

4.) 6.17.74 TDIST
Again, the same problem: only the density function is described, NO 
distribution function (nor quantile function and random generation 
functions). Should be made consistent with ALL distribution functions.

[quantile(p) = TINV(1 - (p/2)) for 2-tailed distribution]

5.) ZTEST
> TODO: OOo Calc and Gnumeric produce the same results. Excel (2007 
> beta) claims to calculate the one-tailed test.
> All produce different results than expected! What OOo Calc and 
> Gnumeric calculate is out of my scope. Excel tries to calculate the 
> probability by integrating from minus infinity to z and substracts 
> this from one. That would only be right, if the absolute value of z is 
> taken, not the signed z! (I speak already of the one-tailed results 
> for this case, so no confusion here.)
> So, either I made a mistake/misinterpretation or all three apps don't 
> get it right(TM)

Well, I'll test it in R. There is NO standard z-test, BUT the t-test 
will do it, too. [There is also the package 'TeachingDemos' wich does 
define the z-test]

> > z.test(x, mu=15, sd=sd(x))
>
>         One Sample z-test
>
> data:  x
> z = 0.3111, n = 30.000, Std. Dev. = 8.803, Std. Dev. of the sample mean
> = 1.607, p-value = 0.7557
> alternative hypothesis: true mean is not equal to 15
> 95 percent confidence interval:
>  12.34980 18.65020
> sample estimates:
> mean of x
>      15.5
where x = 1..30

This same example gives a value of 0.38 in Calc. I have NO idea what 
0.38 stands for. And I just noticed that it is impossible to compute a 
t-test on a single array in Calc against the expected mean - or did I 
miss something?
[The t-test in R gives virtually identical results, because a sample 
size of 30 is large enough to approximate the z-distribution.]

Sincerely,

Leonard


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]