[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

*Subject*: **Re: [office-comment] Demand for modification of ODF file format aboutregression curve in spreadsheet**

*From*:**Patrick Durusau <patrick@durusau.net>***To*: Leonard Mada <discoleo@gmx.net>*Date*: Mon, 08 Dec 2008 15:39:59 -0500

Leonard, When you say: > The idea is: > I want a mechanism to specify the formula used in the regression. > Instead of storing a formula name, it would be wiser to store the > formula itself. This way, one can easily build *complex models* and > *multivariate models* (more than one variable). This is currently > not-possible and ODF lags behind professional packages in every > respect (well, Excel fares poor in this respect, too, but then you > shouldn't look at Excel when doing regressions). Would you suggest that we use R or something similar as the language for such models? (I have utterly no position one way or the other but would like to see us avoid having to define a language for such purposes and then seek implementers for it.) What would that mean in your experience for interchange? I know of R by the name but don't know its history or the level of support for various versions. Would this be a situation where the results of a model would be stored in case the document was processed by an application that lacked R support (assuming we chose that as the language)? Hope you are having a great day! Patrick Leonard Mada wrote: > Dear Laurent, > > I miss some frequently encountered regression types. > > The most frequent regression type on binary outcome variables is a > logistic regression. I therefore miss this one. > > However, what wonders me most, is the number of regression types used. > Well, to state it differently, there is a specific name for every new > regression type. > > There is a better alternative, and this alternative is already > implemented in the S+ language and in the open source R program. It > basically allows the user to specify the formula for the regression. > > There are basically 3 regression models: > > A.) Linear regression > - formulas of type: y = intercept + a1 * X1 + a2 * X2 + a3 * X3 + ... > - as seen, ODF doesn't permit a multivariate formula either, > i.e. X1, X2, X3, ... are different variables > > B.) Generalized linear models > - formulas differ slightly, but in the case of a logistic regression: > p(y) = 1 / (1 + 1/exp(intercept + a1 * X1 + a2 * X2 + a3 * X3 + ...) ) > where y is a binary variable and p(y) the probability of y > > C.) Non-Linear models > - this is the most interesting > - it allows specifying the formula for the regression > - e.g. lets say we want to determine the coefficients a & b for: > a * x / (x*x + b) > in R, this looks like: > model.nls <- nls( y ~ a*x / (x*x + b), start=list(a=1, b=1)) > where y is the outcome and x is the variable > > As a practical example: > [You can copy / paste this in R] > x <- rnorm(1000) # generate 1,000 random numbers > y <- rnorm(1000) + rnorm(1) * x / (x*x+1) > x.nls<-nls(y~a*x / (x*x+b*x+c), start=list(a=1,b=0,c=1)) > summary(x.nls) > >> Formula: y ~ a * x/(x * x + b * x + c) >> >> Parameters: >> Estimate Std. Error t value Pr(>|t|) >> a -1.83005 0.29605 -6.182 9.24e-10 *** >> b -0.04774 0.14024 -0.340 0.734 >> c 1.40155 0.35845 3.910 9.85e-05 *** >> --- >> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > We see, "b" is statistically non-significant and we can remove it from > the model (giving us then a * x / (x*x + c); we can rerun the > regression using this formula to obtain a better result ). > > The idea is: > I want a mechanism to specify the formula used in the regression. > Instead of storing a formula name, it would be wiser to store the > formula itself. This way, one can easily build *complex models* and > *multivariate models* (more than one variable). This is currently > not-possible and ODF lags behind professional packages in every > respect (well, Excel fares poor in this respect, too, but then you > shouldn't look at Excel when doing regressions). > > Sincerely, > > Leonard > > > Laurent BALLAND-POIRIER wrote: >> Dear TC Members, >> >> Please find enclosed a file format modification demand that Ingrid >> Halama and me wrote. It is about regression curves in spreadsheet. Some >> data are missing in ODF to get compatibility with other spreadsheets >> such as MS-Excel or Gnumeric. Numerous issues will not be solved till >> these data can not be saved. >> I hope I post in the right place. If not, please explain where to send >> this demand. >> >> Best regards, >> >> Laurent BP >> > > -- Patrick Durusau patrick@durusau.net Chair, V1 - US TAG to JTC 1/SC 34 Convener, JTC 1/SC 34/WG 3 (Topic Maps) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

**Follow-Ups**:**Re: [office-comment] Demand for modification of ODF file format aboutregression curve in spreadsheet***From:*Leonard Mada <discoleo@gmx.net>

**References**:**Demand for modification of ODF file format about regression curvein spreadsheet***From:*Laurent BALLAND-POIRIER <Laurent.Balland-Poirier@laposte.net>

**Re: [office-comment] Demand for modification of ODF file format aboutregression curve in spreadsheet***From:*Leonard Mada <discoleo@gmx.net>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]