[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-comment] Demand for modification of ODF file format aboutregression curve in spreadsheet
Dear Laurent, I miss some frequently encountered regression types. The most frequent regression type on binary outcome variables is a logistic regression. I therefore miss this one. However, what wonders me most, is the number of regression types used. Well, to state it differently, there is a specific name for every new regression type. There is a better alternative, and this alternative is already implemented in the S+ language and in the open source R program. It basically allows the user to specify the formula for the regression. There are basically 3 regression models: A.) Linear regression - formulas of type: y = intercept + a1 * X1 + a2 * X2 + a3 * X3 + ... - as seen, ODF doesn't permit a multivariate formula either, i.e. X1, X2, X3, ... are different variables B.) Generalized linear models - formulas differ slightly, but in the case of a logistic regression: p(y) = 1 / (1 + 1/exp(intercept + a1 * X1 + a2 * X2 + a3 * X3 + ...) ) where y is a binary variable and p(y) the probability of y C.) Non-Linear models - this is the most interesting - it allows specifying the formula for the regression - e.g. lets say we want to determine the coefficients a & b for: a * x / (x*x + b) in R, this looks like: model.nls <- nls( y ~ a*x / (x*x + b), start=list(a=1, b=1)) where y is the outcome and x is the variable As a practical example: [You can copy / paste this in R] x <- rnorm(1000) # generate 1,000 random numbers y <- rnorm(1000) + rnorm(1) * x / (x*x+1) x.nls<-nls(y~a*x / (x*x+b*x+c), start=list(a=1,b=0,c=1)) summary(x.nls) > Formula: y ~ a * x/(x * x + b * x + c) > > Parameters: > Estimate Std. Error t value Pr(>|t|) > a -1.83005 0.29605 -6.182 9.24e-10 *** > b -0.04774 0.14024 -0.340 0.734 > c 1.40155 0.35845 3.910 9.85e-05 *** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 We see, "b" is statistically non-significant and we can remove it from the model (giving us then a * x / (x*x + c); we can rerun the regression using this formula to obtain a better result ). The idea is: I want a mechanism to specify the formula used in the regression. Instead of storing a formula name, it would be wiser to store the formula itself. This way, one can easily build *complex models* and *multivariate models* (more than one variable). This is currently not-possible and ODF lags behind professional packages in every respect (well, Excel fares poor in this respect, too, but then you shouldn't look at Excel when doing regressions). Sincerely, Leonard Laurent BALLAND-POIRIER wrote: > Dear TC Members, > > Please find enclosed a file format modification demand that Ingrid > Halama and me wrote. It is about regression curves in spreadsheet. Some > data are missing in ODF to get compatibility with other spreadsheets > such as MS-Excel or Gnumeric. Numerous issues will not be solved till > these data can not be saved. > I hope I post in the right place. If not, please explain where to send > this demand. > > Best regards, > > Laurent BP >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]