office-formula message

Subject: RE: [office-formula] Test Cases
From: "Dennis E. Hamilton" <dennis.hamilton@acm.org>
To: "'Eric Patterson'" <ericpa@exchange.microsoft.com>,<office-formula@lists.oasis-open.org>
Date: Tue, 19 Jan 2010 07:55:07 -0800
I'm uneasy about that description of the situation because, although it
describes one of many states-of-affairs, it doesn't specify anything.  It
also doesn't indicate how one might get a handle on limitations of
computational precision in the arithmetic operators and the functions that
necessarily deliver approximations and are also subject to computational
limitations.

I think there are two intertwined matters here.

One has to do with what the data types are and the different levels of
abstraction and representation that we need to have our heads around when we
talk about a Number, a String (or Text), etc.

Sticking to questions of test cases I have three immediate concerns:

  1. With regard to test cases in the specification, I find that I am
seriously ambivalent (more about that after my signature).  I don't believe
there should be test cases in this specification and they certainly should
not have normative significance. Normative observations about how well a
computation behaves and how consistent the computational behavior is in a
formula evaluator need to be expressed some other way.  In particular, one
should be able to know what is needed without seeing test cases.

  2. With regard to the use of indicators for approximation error, the
little epsilons that are shown, these are not informative.  For example, we
don't account in any way for how epsilons on the input bears on epsilon of
the result.  E.g., for very small values of x, approximations for
sin(x+epsilon) will return x+epsilon for epsilon measurable but smaller in
magnitude than x.  On the other hand, if we are talking about exp(x+epsilon)
the result will be something like exp(x)*exp(epsilon) with whatever epsilon2
there is on the result itself.  I use these examples because they are easy
to visualize mathematically.  There are similar situations around use of
interest rates, results that will be made into exact currency values, and so
on, and the handling of ratios is generally a source of surprises.  I am not
proposing a solution to this, but we need to not be foolish about it.  It
would be more useful, to me, to see examples that demonstrate what sort of
epsilon arises in a given implementation or use case so that it could be
noticed and folks (both implementers and expert users assessing
implementations) would not be surprised.

  3. When we talk about test results, I wonder WHAT VALUE ARE WE TALKING
ABOUT?  WHAT VALUE ARE WE LOOKING AT?  When an operation (or a literal
number in a formula) delivers an intermediate result to another operation,
this may be quite a different matter from when a persistent result is
retained as the value of a cell of some sort.  We then need to distinguish
what that value may be (as it will be delivered into the recalculation of
some other formula or cell) versus (1) how that value is persisted for
communication as a cell value or result in a document format and (2) how
that value might appear as a viewable result (displayed or printed) after
formatting rules and other conversions (such as implicit rounding) are
applied.  
   I suspect, in this case, we are concerned with the evaluation result that
a formula-hosting implementation is sitting on.  But in that case, one
assesses that value indirectly, usually with other formulas, not by casual
inspection of the way the result is shown by default in a cell-value
display.

 - Dennis

PS: As much as I don't think test cases of the kind we have should be in the
specification and that "test cases" should not have standalone normative
significance, I yearn for some sort of demonstration formulas that allow
important characteristics of a function to be demonstrated.  I don't know
where these should be found and what the accompanying guidance should be.  

On OIC I said "I am thinking of simple relationships and identities that
should be apparent (or not) on inspection of certain cases, cases that
reveal actual deviations resulting from calculation and
number-representation limitations, and handling of [near-]edge cases and
cases generally though of as undefined or out-of-bounds."

I still think that would be extremely valuable.  Not sure where it goes, if
anywhere, in respect to the OpenFormula specification.

-----Original Message-----
From: Eric Patterson [mailto:ericpa@exchange.microsoft.com] 
Sent: Monday, January 18, 2010 16:25
To: office-formula@lists.oasis-open.org
Subject: [office-formula] Test Cases

I'm starting to go through the open JIRA issues related to test cases.  My
goal is to ensure that the normative text accounts for any issues that have
been reported against the related test cases.  Many of the test cases
include an Epsilon value showing the acceptable range of return values.
Before dismissing this aspect of the test cases, we should make sure that we
are all in agreement that this is adequately covered in the written spec.  

I believe that the below part from section 3.3.1 (In Patrick's latest
upload) covers the precision issue.  If this can be added to the agenda for
tomorrow's meeting we can see if we have consensus.

Thanks,
Eric


Number
General
A number is simply a numeric value such as 0, -4.5, or $1000. Numbers shall
be able to represent fractional values (they shall not be limited to only
integers). The "number" type may be displayed in many different formats,
including date, time, percentage, and currency.
Typical implementations implement numbers as 64-bit IEEE floating point
values and use the CPU's floating-point instructions where available (so
intermediate values may be represented using more than 64 bits). However,
implementations have great freedom in how they implement Number, and may use
representations with a fixed bit length or a variable bit length. A cell
with a constant numeric value has the number type.
Note that many formula creators are not sophisticated in their understanding
of how computers determine their results. Many users, for example, do not
understand computer floating point arithmetic models, and have no idea that
many implementations use a base other than 10 (or what that would mean). In
particular, many implementations use base 2 representations, with the result
that value 0.1 can only be represented imprecisely (just as 1/3 can only
imprecisely represented in a base 10 decimal representation). This problem
applies to all uses of typical computing equipment, including nearly all
programming languages, and this specification does not attempt to fully
resolve the problem of unsophisticated users.
Rationale: Originally some efforts were expended to try to make formulas
produce the "expected answer" for unsophisticated users. In particular, the
equal-to operator for numbers matches imprecisely in many applications,
because many users do not understand that (1/3)*3 on most implementations
will produce a value close to one but not precisely equal to one. Originally
there was a test to ensure that (1/3)*3 was equal to 1. The Gnumeric
developers objected, on the grounds that requiring that equality be "sloppy"
made it very difficult for sophisticated users to use spreadsheets to their
full capabilities. In contrast, the function INT still requires that
INT((1/3)*3) is 1, because if INT does not do so, many user's spreadsheets
will not work as they expect. The expected answers of INT may not make
numerical analysts happy, but users will get what appears (to them) to be
wrong answers otherwise.

---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
References:
- office-2246
  - From: "Andreas J. Guelzow" <aguelzow@math.concordia.ab.ca>
- [office-formula] office-2227: Proposed Resolution
  - From: Eric Patterson <ericpa@exchange.microsoft.com>
- Re: [office-formula] office-2227: Proposed Resolution
  - From: Eike Rathke <erack@sun.com>
- RE: [office-formula] office-2227: Proposed Resolution
  - From: Eric Patterson <ericpa@exchange.microsoft.com>
- [office-formula] Test Cases
  - From: Eric Patterson <ericpa@exchange.microsoft.com>