*Subject*: **Re: [office-formula] CHITEST definition**

*From*:**Eike Rathke <erack@sun.com>***To*: office-formula@lists.oasis-open.org*Date*: Wed, 14 Mar 2007 17:44:50 +0100

Hi Andreas, On Monday, 2007-03-12 14:01:52 -0600, Andreas J. Guelzow wrote: > > It claims to be an independence test. > > I know. But then it shouldn't use the "expected values" but calculate > them from the row and column sums of the observed values. > > > On the other hand ... > > > > > In the case of a Goodness-of-Fit test one would have observed and > > > expected frequencies (corresponding to each other) and the degree of > > > freedom should be n-1. > > > > ... it gets two arrays passed, one of actual/observed values and one of > > expected values. See also the current definition in the latest draft > > document uploaded on Friday. If each column of observed values would > > represent a different group, matching those of the expected values, what > > would n-1 be then? > > So you really are performing several chi-sq tests in a single call? Not really, internally LEGACY.CHIDIST is called with the Chi-square of all values computed and the degree of freedom, see formula in latest draft. The result OOo gets is the same (+- epsilon) to that of Excel. Gnumeric gets a different result if rows and columns are each greater than 2, probably because it uses a different degree of freedom. Kspread doesn't have that function. > > For example: > > > > observed: x,y expected: X,Y =CHITEST(A1:B4;D1:E4) > > > > | A B C D E > > -+-------------- > > 1| x y X Y > > 2| x y X Y > > 3| x y X Y > > 4| x y X Y > > > > How would the (rows-1)*(cols-1) fit in there? > > I can't see how that fits. Well, it seems that is what Excel and OOo do though. We should call the function LEGACY.CHITEST ... > Neter/Wasserman/Whitmore: > > When the sampled population has the probability distribution specified > in H0 and the samepl size n is resonably large then > X^2 is chi^2 distributed with k-m-1 degrees of freedom where k is the > number of classes and m the number of parameters estimated from the > sample data. Whatever the "number of parameters estimated" may be.. I think we should not insist on understanding what Excel had in mind when introducing the function (though it would be great if someone could come up with a real explanation) but simply spec how it works, and add a note about its weirdness. Eike -- Automatic string conversions considered dangerous. They are the GOTO statements of spreadsheets. --Robert Weir on the OpenDocument formula subcommittee's list.

