[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviorsin OpenFormula
"David A. Wheeler" <dwheeler@dwheeler.com> wrote on 06/11/2009 09:12:04 PM: > > robert_weir@us.ibm.com wrote: > > I've been going through the current draft of OpenFormula, looking for > > areas that are specifically called out as "implementation-defined", > > "unspecified" or "undefined". I did not find as many as I thought I would > > find. > > I think that's a good thing! > > > I created a spreadsheet that illustrated each one of these cases, which I > > am attaching. > > Thanks for doing that. And I agree, a little form to fill these out > (for example) would be great. > Let me know if there are any missing that jump out in your mind. If there are, then they may not have been consistently called out with the term "implementation-defined" in the specification, and we should fix that. > > Then I wonder if we truly need to have all of these items be > > implementation-defined? Or to ask the question differently, would there > > be tangible user benefit, in terms of increased interoperability, if some > > of these items were fully specified, knowing that some implementations > > would then need to change their code in order to conform, and that they > > would need to deal (perhaps with version-conditional logic) with legacy > > documents? > > Re-examining these is a good idea. > > However, I think that expecting "0 implementation-defined values" is > both unrealistic and undesirable in most real standards, including this > one. In all non-trivial standards there are areas where there are > legitimate differences, and trying to prematurely force a specific > answer is simply undesirable. Simply identifying those areas, so users > know what to avoid, is a major benefit, even when we don't specify the > specifics. > I'd like to see us come up with a good reason why it is a good thing to have a feature be implementation-defined. Saying "mathematicians disagree" or "different implementations do different things" doesn't sound like a particularly good reason. I think it is expected that implementations will need to change their code to implement OpenFormula. I'd be astonished if the did not. That said, I'm sure we can come up with some good reasons. For example, the exact numeric precision is not specified. This is not because having consistent floating point behavior would not be a good thing. The issue is that enforcing such uniformity, across machine architectures, would essentially mean that we avoid on-chip floating point and do it via emulation, which would perform poorly and be expensive to implement, with little incremental user benefit. That's the analysis I'd like to see: What is the user benefit if we eliminated these differences versus what would be the downside. > Regarding the specifics, I have comments on two: > * I have to admit, I'm tired of the 0^0 discussions, but we can have > another one. There are good arguments for 1, or 0, or an Error, and > actual implementations DO vary, so it's hard to pin that down. I don't > think this has a massive impact on interoperability; it'd be NICE to pin > down, but it can be managed. The nice thing here is that any choice is defensible. It is not like any of them are wrong. We're not redefining the Gregorian calendar or anything. So I'd just pick one, based on the majority (or plurality if that is the case) behavior. Or do whatever Excel does here, if that makes Doug happy. I'd certainly have no hesitancy to add an "if" statement to the Symphony code if it were necessary for us to accommodate this. > * In practice, I don't know why anyone would CARE what SUM() does with > an empty argument list. This is not a REAL interoperability issue; it's > hard to imagine a normal user even DOING that. We could leave that > completely *undefined*, and not impact real world interoperability. > Strange things can happen. Did you read this story last year, about Barclays Capital and their Excel error: http://www.itworld.com/business/56161/excel-error-gives-barclays-more-lehman-assets-it-wanted So a sheet can start out reasonable, and then become messed up by editing like that when someone tries to clean it up for printing or whatever. It might not be reasonable spreadsheet use, but I'd argue that even unreasonable spreadsheet use should usually have deterministic behavior, even if it is just an error indication. In the case of SUM(), from a formal perspective, why isn't this just a syntax error? If a function is defined with a given parameter list, and the parameters are of the wrong number or type, or cannot be coerced to that type, then it is a syntax error. I'd expect that syntax errors of this type would be treated uniformly. It would be odd for SUM() to be different than any other function in that regard. But I'd be equally happy if we define SUM so it explicitly permits zero parameters, in which case we should explicitly define it. My concern is that in the post Enron, post Wall St. melt-down world, financial and risk models and similar are given far more scrutiny than every before. With Sarbannes-Oxley, CEO's and CFO's are now directly and personally liable for errors when they sign off on their financial filings. A spreadsheet formula language that has substantial implementation-defined behaviors in calculation is one that can easily be made to sound very scary and risky to a potential adoptor. I bring this up now because I can easily see this being brought up later. You and I know that 0^0 or SUM() are very rare occurrences. But I also know that my competitors could easily make this sound far scarier to a potential adopter if they wanted to convince them not to adopt ODF 1.2 or an ODF 1.2 application. So I'd like to recommend that we bias our decisions toward eliminating these implementation-defined behaviors, unless we can agree on a good reason why it is better that they remain. In some cases I think we can make that argument. For example, the various text to number implicit and explicit conversions. They are undoubtedly useful, but are so tied up in locale-dependent issues that defining them completely would essentially require that we specify the enumeration systems used by all of the world's languages. And even if we did that, the results would still not be interoperable, because a string like "1,234" would have one numeric value in a German-locale spreadsheet and another value in a English locale spreadsheet. I could see deprecating the implicit conversions altogether and requiring an explicit statement of locale for the explicit case, something like VALUE("1,234"; "en-US"). That which would allow it to be well-defined and interoperable. But that is a more radical change and I'd agree to defer that for now and we can reconsider it in the next release. > If we can nail down a few more specifics, that'd be great. But we > needn't get hung up on this. > I agree we should not spend much time on this. But there are only a handful of these items, so it should not take long. -Rob
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]