[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-comment] Re: [office] Next set of public comments forreview (#18-#35)
Leonard Mada: > this is a comment to one of the threads on the restricted > office@lists.oasis-open.org list. > I believe the discussion is too important and the current implementation > is too dangerous to keep the current state of affairs. Happy to discuss, but I don't at all agree that the current implementation is "too dangerous". People have used it, and accepted it, in spreadsheets for more than 20 years. > > I believe #24 is done. This involved handling references to empty > > cells; new text in section 4 (Types) clarifies what happens in these > > cases. > > > > Basically, there isn't a special type called the empty type. > > This goes strongly against ALL concepts of professional statistical > applications, where such values are given a special type/identifier, > namely these are "missing values", or shortly NA. (e.g. R, > http://cran.R-project.org, and others) First, R is not a spreadsheet application, it's a statistical package. It's spectacular at that purpose, but OpenFormula is _primarily_ for use in spreadsheet applications. We want OpenFormula to be _usable_ in other circumstances, but those must take second fiddle to its use in spreadsheets. I quite agree that there is a need for indicating "not available", but the standard way to do that in spreadsheets (going back to at least Lotus 1-2-3) is to use #N/A, and NOT merely an empty cell. The value #N/A is a distinguished Error value returned by function NA(). So the mechanism for identifying important missing values already exists. Using empty cells to mean #N/A is a dangerous practice; spreadsheets typically contain many empty cells, and a viewer couldn't easily tell which empty cells were "important". > > Instead, there is a type "Reference", which _may_ refer to empty cells > > or ranges that include empty cells. What happens to empty cells > > depends on the required types and implicit conversion rules, as > > defined in section 4. > > I hate implicit conversion rules. Namely, the conversion to (int/float) > 0 is mostly unneeded and unwanted. We actually have FEWER implicit conversion rules compared to, say, OOXML. Excel and OOXML require that _ANY_ reference to text in a number context be FORCEABLY converted to a number. That particular conversion is fraught with peril, because it can fail when you change locale. (e.g., if the string is "5.5", it will convert find in en_US but will fail in fr because the decimal separator there is "," not "."). And many spreadsheet programs don't do that particular implicit conversion (e.g., Lotus 1-2-3, OOo). Thankfully, few spreadsheets actually require this particular conversion. But we have to handle _existing_ spreadsheet applications, and be usable to ordinary users. They DO expect several implicit conversions, and build documents that require them. What's more, most spreadsheet authors are NOT skilled statisticians. We need to make the language relatively easy-to-use for these less-skilled people. Most languages have automatic conversions, because it's too painful to use those languages without them. This is no different, and in any case, we're specifying _existing_ behavior which is necessary to exchange existing spreadsheet documents. > There are basically 2 useful approaches: > A.) ignore the cell completely > -- without raising any error > -- without interpreting the cell as 0 or empty string > -- should be probably the default Looks like we agree completely, at least in the case where the function expects a NumberSequence or the like. If a function expects a NumberSequence, it will ignore empty cells completely, without interpreting the cell as 0 or the empty string. That's already spec'ed. If a function expects a _scalar_ value, though, and gets a reference to an empty cell, what do we do? Your first two criteria conflict; either we raise an error, or we interpret it somehow as a scalar (e.g., as a 0 or ""). We've decided to follow standard convention, and convert. It's what existing documents require, and what existing users expect. > B.) Interpret as an NA-Error and propagate the error downstream > -- probably only in a minority of cases > [e.g. debugging spreadsheets] Again, sounds like we agree that these are minority cases. You can actually do this today by using ISBLANK or COUNTBLANK, e.g.: IF(COUNBLANK(Range)>0; #N/A; dostuff(Range)) > As I (probably) mentioned in some previous message [I can't find it, > though], R has a more advanced mechanism to handle missing values, and I > would welcome any improvement in this area regarding ODF. > > IF a value is missing, one can pass additional arguments to R-functions, > which specify how to handle those missing values. No existing spreadsheet application includes a mechanism for additional named arguments. Obviously there are languages which have it (R, Ada, and Python come to mind). But adding a major new unprecedented mechanism to handle a rare case seems extreme. If you want values to be considered as NA, then put #N/A in that cell. Problem solved. It's also MUCH clearer what's going on. > So, the default in R is *na.omit* and this is what spreadsheets should > be doing, too. Omit the whole cell, don't include it in any > calculations, do NOT even evaluate the empty cell! Definitely, do not > evaluate it as "0"! That's already the case when a NumberSequence or other Sequence is the required type, and as I noted earlier, there really isn't a reasonable alternative default when a scalar is required. --- David A. Wheeler
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]