OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [office-comment] Demand for modification of ODF file format aboutregression curve in spreadsheet


Leonard,

When you say:

> The idea is:
> I want a mechanism to specify the formula used in the regression. 
> Instead of storing a formula name, it would be wiser to store the 
> formula itself. This way, one can easily build *complex models* and 
> *multivariate models* (more than one variable). This is currently 
> not-possible and ODF lags behind professional packages in every 
> respect (well, Excel fares poor in this respect, too, but then you 
> shouldn't look at Excel when doing regressions). 
Would you suggest that we use R or something similar as the language for 
such models? (I have utterly no position one way or the other but would 
like to see us avoid having to define a language for such purposes and 
then seek implementers for it.)

What would that mean in your experience for interchange?

I know of R by the name but don't know its history or the level of 
support for various versions.

Would this be a situation where the results of a model would be stored 
in case the document was processed by an application that lacked R 
support (assuming we chose that as the language)?

Hope you are having a great day!

Patrick

Leonard Mada wrote:
> Dear Laurent,
>
> I miss some frequently encountered regression types.
>
> The most frequent regression type on binary outcome variables is a 
> logistic regression. I therefore miss this one.
>
> However, what wonders me most, is the number of regression types used. 
> Well, to state it differently, there is a specific name for every new 
> regression type.
>
> There is a better alternative, and this alternative is already 
> implemented in the S+ language and in the open source R program. It 
> basically allows the user to specify the formula for the regression.
>
> There are basically 3 regression models:
>
> A.) Linear regression
> - formulas of type: y = intercept + a1 * X1 + a2 * X2 + a3 * X3 + ...
> - as seen, ODF doesn't permit a multivariate formula either,
> i.e. X1, X2, X3, ... are different variables
>
> B.) Generalized linear models
> - formulas differ slightly, but in the case of a logistic regression:
> p(y) = 1 / (1 + 1/exp(intercept + a1 * X1 + a2 * X2 + a3 * X3 + ...) )
> where y is a binary variable and p(y) the probability of y
>
> C.) Non-Linear models
> - this is the most interesting
> - it allows specifying the formula for the regression
> - e.g. lets say we want to determine the coefficients a & b for:
> a * x / (x*x + b)
> in R, this looks like:
> model.nls <- nls( y ~ a*x / (x*x + b), start=list(a=1, b=1))
> where y is the outcome and x is the variable
>
> As a practical example:
> [You can copy / paste this in R]
> x <- rnorm(1000) # generate 1,000 random numbers
> y <- rnorm(1000) + rnorm(1) * x / (x*x+1)
> x.nls<-nls(y~a*x / (x*x+b*x+c), start=list(a=1,b=0,c=1))
> summary(x.nls)
>
>> Formula: y ~ a * x/(x * x + b * x + c)
>>
>> Parameters:
>> Estimate Std. Error t value Pr(>|t|)
>> a -1.83005 0.29605 -6.182 9.24e-10 ***
>> b -0.04774 0.14024 -0.340 0.734
>> c 1.40155 0.35845 3.910 9.85e-05 ***
>> ---
>> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> We see, "b" is statistically non-significant and we can remove it from 
> the model (giving us then a * x / (x*x + c); we can rerun the 
> regression using this formula to obtain a better result ).
>
> The idea is:
> I want a mechanism to specify the formula used in the regression. 
> Instead of storing a formula name, it would be wiser to store the 
> formula itself. This way, one can easily build *complex models* and 
> *multivariate models* (more than one variable). This is currently 
> not-possible and ODF lags behind professional packages in every 
> respect (well, Excel fares poor in this respect, too, but then you 
> shouldn't look at Excel when doing regressions).
>
> Sincerely,
>
> Leonard
>
>
> Laurent BALLAND-POIRIER wrote:
>> Dear TC Members,
>>
>> Please find enclosed a file format modification demand that Ingrid
>> Halama and me wrote. It is about regression curves in spreadsheet. Some
>> data are missing in ODF to get compatibility with other spreadsheets
>> such as MS-Excel or Gnumeric. Numerous issues will not be solved till
>> these data can not be saved.
>> I hope I post in the right place. If not, please explain where to send
>> this demand.
>>
>> Best regards,
>>
>> Laurent BP
>>
>
>

-- 
Patrick Durusau
patrick@durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]