office-formula message

Subject: Re: [office-formula] CHITEST definition

From: Eike Rathke <erack@sun.com>
To: office-formula@lists.oasis-open.org
Date: Wed, 14 Mar 2007 17:44:50 +0100

Hi Andreas,

On Monday, 2007-03-12 14:01:52 -0600, Andreas J. Guelzow wrote:

> > It claims to be an independence test. 
> 
> I know. But then it shouldn't use the "expected values" but calculate
> them from the row and column sums of the observed values.
> 
> > On the other hand ...
> > 
> > > In the case of a Goodness-of-Fit test one would have observed and
> > > expected frequencies (corresponding to each other) and the degree of
> > > freedom should be n-1.
> > 
> > ... it gets two arrays passed, one of actual/observed values and one of
> > expected values. See also the current definition in the latest draft
> > document uploaded on Friday. If each column of observed values would
> > represent a different group, matching those of the expected values, what
> > would n-1 be then? 
> 
> So you really are performing several chi-sq tests in a single call?

Not really, internally LEGACY.CHIDIST is called with the Chi-square of
all values computed and the degree of freedom, see formula in latest
draft. The result OOo gets is the same (+- epsilon) to that of Excel.
Gnumeric gets a different result if rows and columns are each greater
than 2, probably because it uses a different degree of freedom. Kspread
doesn't have that function.

> > For example:
> > 
> > observed: x,y   expected: X,Y   =CHITEST(A1:B4;D1:E4)
> > 
> >  | A  B  C  D  E
> > -+--------------
> > 1| x  y     X  Y
> > 2| x  y     X  Y
> > 3| x  y     X  Y
> > 4| x  y     X  Y
> > 
> > How would the (rows-1)*(cols-1) fit in there?
> 
> I can't see how that fits.

Well, it seems that is what Excel and OOo do though. We should call the
function LEGACY.CHITEST ...

> Neter/Wasserman/Whitmore:
> 
> When the sampled population has the probability distribution specified
> in H0 and the samepl size n is resonably large then
> X^2 is chi^2 distributed with k-m-1 degrees of freedom where k is the
> number of classes and m the number of parameters estimated from the
> sample data.

Whatever the "number of parameters estimated" may be..

I think we should not insist on understanding what Excel had in mind
when introducing the function (though it would be great if someone could
come up with a real explanation) but simply spec how it works, and add
a note about its weirdness.

  Eike

-- 
Automatic string conversions considered dangerous. They are the GOTO statements
of spreadsheets.  --Robert Weir on the OpenDocument formula subcommittee's list.

References:
- CHITEST definition
  - From: Eike Rathke <erack@sun.com>
- Re: [office-formula] CHITEST definition
  - From: Eike Rathke <erack@sun.com>
- Re: [office-formula] CHITEST definition
  - From: Eike Rathke <erack@sun.com>
- Re: [office-formula] CHITEST definition
  - From: "Andreas J. Guelzow" <aguelzow@math.concordia.ab.ca>
- Re: [office-formula] CHITEST definition
  - From: Eike Rathke <erack@sun.com>
- Re: [office-formula] CHITEST definition
  - From: "Andreas J. Guelzow" <aguelzow@math.concordia.ab.ca>