OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

# office-formula message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Re: [office-formula] CHITEST definition

• From: Eike Rathke <erack@sun.com>
• To: office-formula@lists.oasis-open.org
• Date: Wed, 14 Mar 2007 17:44:50 +0100

```Hi Andreas,

On Monday, 2007-03-12 14:01:52 -0600, Andreas J. Guelzow wrote:

> > It claims to be an independence test.
>
> I know. But then it shouldn't use the "expected values" but calculate
> them from the row and column sums of the observed values.
>
> > On the other hand ...
> >
> > > In the case of a Goodness-of-Fit test one would have observed and
> > > expected frequencies (corresponding to each other) and the degree of
> > > freedom should be n-1.
> >
> > ... it gets two arrays passed, one of actual/observed values and one of
> > expected values. See also the current definition in the latest draft
> > document uploaded on Friday. If each column of observed values would
> > represent a different group, matching those of the expected values, what
> > would n-1 be then?
>
> So you really are performing several chi-sq tests in a single call?

Not really, internally LEGACY.CHIDIST is called with the Chi-square of
all values computed and the degree of freedom, see formula in latest
draft. The result OOo gets is the same (+- epsilon) to that of Excel.
Gnumeric gets a different result if rows and columns are each greater
than 2, probably because it uses a different degree of freedom. Kspread
doesn't have that function.

> > For example:
> >
> > observed: x,y   expected: X,Y   =CHITEST(A1:B4;D1:E4)
> >
> >  | A  B  C  D  E
> > -+--------------
> > 1| x  y     X  Y
> > 2| x  y     X  Y
> > 3| x  y     X  Y
> > 4| x  y     X  Y
> >
> > How would the (rows-1)*(cols-1) fit in there?
>
> I can't see how that fits.

Well, it seems that is what Excel and OOo do though. We should call the
function LEGACY.CHITEST ...

> Neter/Wasserman/Whitmore:
>
> When the sampled population has the probability distribution specified
> in H0 and the samepl size n is resonably large then
> X^2 is chi^2 distributed with k-m-1 degrees of freedom where k is the
> number of classes and m the number of parameters estimated from the
> sample data.

Whatever the "number of parameters estimated" may be..

I think we should not insist on understanding what Excel had in mind
when introducing the function (though it would be great if someone could
come up with a real explanation) but simply spec how it works, and add