office-formula message

Subject: Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviorsin OpenFormula

From: robert_weir@us.ibm.com
To: "office-formula@lists.oasis-open.org" <office-formula@lists.oasis-open.org>
Date: Fri, 12 Jun 2009 00:19:01 -0400

"David A. Wheeler" <dwheeler@dwheeler.com> wrote on 06/11/2009 09:12:04 
PM:

> 
> robert_weir@us.ibm.com wrote:
> > I've been going through the current draft of OpenFormula, looking for 
> > areas that are specifically called out as "implementation-defined", 
> > "unspecified" or "undefined".  I did not find as many as I thought I 
would 
> > find.
> 
> I think that's a good thing!
> 
> > I created a spreadsheet that illustrated each one of these cases, 
which I 
> > am attaching.
> 
> Thanks for doing that.  And I agree, a little form to fill these out 
> (for example) would be great.
> 

Let me know if there are any missing that jump out in your mind.  If there 
are, then they may not have been consistently called out with the term 
"implementation-defined" in the specification, and we should fix that.

> > Then I wonder if we truly need to have all of these items be 
> > implementation-defined?  Or to ask the question differently, would 
there 
> > be tangible user benefit, in terms of increased interoperability, if 
some 
> > of these items were fully specified, knowing that some implementations 

> > would then need to change their code in order to conform, and that 
they 
> > would need to deal (perhaps with version-conditional logic) with 
legacy 
> > documents?
> 
> Re-examining these is a good idea.
> 
> However, I think that expecting "0 implementation-defined values" is 
> both unrealistic and undesirable in most real standards, including this 
> one.  In all non-trivial standards there are areas where there are 
> legitimate differences, and trying to prematurely force a specific 
> answer is simply undesirable.  Simply identifying those areas, so users 
> know what to avoid, is a major benefit, even when we don't specify the 
> specifics.
> 

I'd like to see us come up with a good reason why it is a good thing to 
have a feature be implementation-defined.  Saying "mathematicians 
disagree" or "different implementations do different things" doesn't sound 
like a particularly good reason.  I think it is expected that 
implementations will need to change their code to implement OpenFormula. 
I'd be astonished if the did not.

That said, I'm sure we can come up with some good reasons.  For example, 
the exact numeric precision is not specified.  This is not because having 
consistent floating point behavior would not be a good thing.  The issue 
is that enforcing such uniformity, across machine architectures, would 
essentially mean that we avoid on-chip floating point and do it via 
emulation, which would perform poorly and be expensive to implement, with 
little incremental user benefit.

That's the analysis I'd like to see:  What is the user benefit if we 
eliminated these differences versus what would be the downside.

> Regarding the specifics, I have comments on two:
> * I have to admit, I'm tired of the 0^0 discussions, but we can have 
> another one.  There are good arguments for 1, or 0, or an Error, and 
> actual implementations DO vary, so it's hard to pin that down.  I don't 
> think this has a massive impact on interoperability; it'd be NICE to pin 

> down, but it can be managed.

The nice thing here is that any choice is defensible.  It is not like any 
of them are wrong.  We're not redefining the Gregorian calendar or 
anything.  So I'd just pick one, based on the majority (or plurality if 
that is the case) behavior. Or do whatever Excel does here, if that makes 
Doug happy.  I'd certainly have no hesitancy to add an "if" statement to 
the Symphony code if it were necessary for us to accommodate this.

> * In practice, I don't know why anyone would CARE what SUM() does with 
> an empty argument list.  This is not a REAL interoperability issue; it's 

> hard to imagine a normal user even DOING that.  We could leave that 
> completely *undefined*, and not impact real world interoperability.
> 

Strange things can happen.  Did you read this story last year, about 
Barclays Capital and their Excel error: 
http://www.itworld.com/business/56161/excel-error-gives-barclays-more-lehman-assets-it-wanted

So a sheet can start out reasonable, and then become messed up by editing 
like that when someone tries to clean it up for printing or whatever.  It 
might not be reasonable spreadsheet use, but I'd argue that even 
unreasonable spreadsheet use should usually have deterministic behavior, 
even if it is just an error indication.

In the case of SUM(), from a formal perspective, why isn't this just a 
syntax error? If a function is defined with a given parameter list, and 
the parameters are of the wrong number or type, or cannot be coerced to 
that type, then it is a syntax error.  I'd expect that syntax errors of 
this type would be treated uniformly.  It would be odd for SUM() to be 
different than any other function in that regard.

But I'd be equally happy if we define SUM so it explicitly permits zero 
parameters, in which case we should explicitly define it.

My concern is that in the post Enron, post Wall St. melt-down world, 
financial and risk models and similar are given far more scrutiny than 
every before.  With Sarbannes-Oxley, CEO's and CFO's are now directly and 
personally liable for errors when they sign off on their financial 
filings.  A spreadsheet formula language that has substantial 
implementation-defined behaviors in calculation is one that can easily be 
made to sound very scary and risky to a potential adoptor.  I bring this 
up now because I can easily see this being brought up later.  You and I 
know that 0^0 or SUM() are very rare occurrences.  But I also know that my 
competitors could easily make this sound far scarier to a potential 
adopter if they wanted to convince them not to adopt ODF 1.2 or an ODF 1.2 
application. 

So I'd like to recommend that we bias our decisions toward eliminating 
these implementation-defined behaviors, unless we can agree on a good 
reason why it is better that they remain.  In some cases I think we can 
make that argument.  For example, the various text to number implicit and 
explicit conversions.  They are undoubtedly useful, but are so tied up in 
locale-dependent issues that defining them completely would essentially 
require that we specify the enumeration systems used by all of the world's 
languages.  And even if we did that, the results would still not be 
interoperable, because a string like "1,234" would have one numeric value 
in a German-locale spreadsheet and another value in a English locale 
spreadsheet.

I could see deprecating the implicit conversions altogether and requiring 
an explicit statement of locale for the explicit case, something like 
VALUE("1,234"; "en-US").  That which would allow it to be well-defined and 
interoperable.  But that is a more radical change and I'd agree to defer 
that for now and we can reconsider it in the next release.

> If we can nail down a few more specifics, that'd be great.  But we 
> needn't get hung up on this.
> 

I agree we should not spend much time on this.  But there are only a 
handful of these items, so it should not take long.

-Rob

Follow-Ups:
- RE: [office-formula] Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula
  - From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
- Re: [office-formula] Implementation-defined, Unspecified, andUndefined behaviors in OpenFormula
  - From: Andreas J Guelzow <aguelzow@math.concordia.ab.ca>

References:
- Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula
  - From: robert_weir@us.ibm.com
- Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>