[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-comment] ODFF: list of suggestions
Hello everyone, I waded through more statistical functions (OASIS Open Office Specification, 2007-12-28) 1.) BINOMDIST() consider rewriting that section. Saying that it is not very informative is an understatement. I strongly suggest explaining it as a density, respectively as a distribution function (and use consistently the dist-keyword). 2.) 6.17.23 FISHER transformation Beyond the fact that the name is easily misinterpreted as the *gold-standard* test for count-data (the FISHER-Exact test), the formula is most certainly wrong. IF I read it correctly ((1-r)/(1-r), which would be 1), then this has to be wrong, and it should be (1+r)/(1-r). The same applies to the FISHERINV function. I would strongly suggest to change the names of these functions to something like TRANSFISHER() and TRANSFISHERINV(). You may want to implement later the Fisher-exact test and it makes sense to have this name for that test. 3.) LEGACY.CHITEST > Note: Applications usually describe the CHITEST function as a > Chi-square independence test. From a mathematical point of view this > is not correct, as that would not involve testing some actual data > against a set of expected values. It resembles more a Goodness-for-Fit > test, but how the degree of freedom is calculated actually doesn't > make sense then. This is specified to be inter operable with Excel and > OpenOffice.org. Gnumeric gets different results if the number of rows > and columns both are greater than 2. Well, I suggest comparing the results with the FISHER-EXACT test, e.g. in R. Also, every statistical package (R, EPI INFO, SPSS, ...) do NOT need the expected values, as they compute them automatically from the n*m table. I wonder why spreadsheets do NOT do it automatically, as well. Most users simply fail to compute the correct value. Well, I try to teach them, but almost everyone will get it wrong a week later. [It is way easier to remember the shortcut for a 2x2 table, aka (ad-bc)^2 * N/(n1 * n2 * n3 * n4), where n1-4 are the 4 subtotals, then compute accurately the expected values.] 4.) 6.17.74 TDIST Again, the same problem: only the density function is described, NO distribution function (nor quantile function and random generation functions). Should be made consistent with ALL distribution functions. [quantile(p) = TINV(1 - (p/2)) for 2-tailed distribution] 5.) ZTEST > TODO: OOo Calc and Gnumeric produce the same results. Excel (2007 > beta) claims to calculate the one-tailed test. > All produce different results than expected! What OOo Calc and > Gnumeric calculate is out of my scope. Excel tries to calculate the > probability by integrating from minus infinity to z and substracts > this from one. That would only be right, if the absolute value of z is > taken, not the signed z! (I speak already of the one-tailed results > for this case, so no confusion here.) > So, either I made a mistake/misinterpretation or all three apps don't > get it right(TM) Well, I'll test it in R. There is NO standard z-test, BUT the t-test will do it, too. [There is also the package 'TeachingDemos' wich does define the z-test] > > z.test(x, mu=15, sd=sd(x)) > > One Sample z-test > > data: x > z = 0.3111, n = 30.000, Std. Dev. = 8.803, Std. Dev. of the sample mean > = 1.607, p-value = 0.7557 > alternative hypothesis: true mean is not equal to 15 > 95 percent confidence interval: > 12.34980 18.65020 > sample estimates: > mean of x > 15.5 where x = 1..30 This same example gives a value of 0.38 in Calc. I have NO idea what 0.38 stands for. And I just noticed that it is impossible to compute a t-test on a single array in Calc against the expected mean - or did I miss something? [The t-test in R gives virtually identical results, because a sample size of 30 is large enough to approximate the z-distribution.] Sincerely, Leonard
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]