OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: [OASIS Issue Tracker] Commented: (OFFICE-2309) LEGACY.CHITEST

    [ http://tools.oasis-open.org/issues/browse/OFFICE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18627#action_18627 ] 

Andreas Guelzow  commented on OFFICE-2309:

Gnumeric assumes that LEGACY.CHITEST is a goodness-of-fit test.. This is the only thing that really makes sense when the user is asked to provide both observed and expected value. In that case the degree of freedom fo the chi-square-distribution involved is (number of observations minus 1). 
OOo, Excel and Gnumeric appear to agree on this one if the ranges are 1 by n or n by 1.

If the "observed" and "expected" ranges are n by m with n and m larger than 1, than OO and Excel use a dgegree of freedom of (n-1)*(m-1). This would be appropriate if the expected values were calculated from the observed one (ie. if one would perform a test of independence or test of homogeneity). Neither OO nor Excel confirm that the expected values are correct in that case, and in fact they should not even required to be given.

I expected that in this case Gnumeric also calculates a goodness of fit test (ie. uses a df of N*M-1), but looking at the code that isn't true. Gnumeric's calculation is as meaningless as OOo's and Excel's just different. (So we really should just ignore what GNumeric is doing.)

> --------------
>                 Key: OFFICE-2309
>                 URL: http://tools.oasis-open.org/issues/browse/OFFICE-2309
>             Project: OASIS Open Document Format for Office Applications (OpenDocument) TC
>          Issue Type: Sub-task
>          Components: OpenFormula
>            Reporter: Robert Weir 
>            Assignee: Robert Weir 
>             Fix For: ODF 1.2 Part 2 CD 3
> > Note: Applications usually describe the CHITEST function as a 
> > Chi-square independence test. From a mathematical point of view this 
> > is not correct, as that would not involve testing some actual data 
> > against a set of expected values. It resembles more a Goodness-for-Fit 
> > test, but how the degree of freedom is calculated actually doesn't 
> > make sense then. This is specified to be inter operable with Excel and 
> > OpenOffice.org. Gnumeric gets different results if the number of rows 
> > and columns both are greater than 2.
> Well, I suggest comparing the results with the FISHER-EXACT test, e.g. in R.
> Also, every statistical package (R, EPI INFO, SPSS, ...) do NOT need the 
> expected values, as they compute them automatically from the n*m table. 
> I wonder why spreadsheets do NOT do it automatically, as well. Most 
> users simply fail to compute the correct value. Well, I try to teach 
> them, but almost everyone will get it wrong a week later. [It is way 
> easier to remember the shortcut for a 2x2 table, aka (ad-bc)^2 * N/(n1 * 
> n2 * n3 * n4), where n1-4 are the 4 subtotals, then compute accurately 
> the expected values.]

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://tools.oasis-open.org/issues/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]