office-formula message

Subject: Thoughts on OpenFormula, executation context, calculation settings, limitsand how we put this all together.
From: robert_weir@us.ibm.com
To: "office-formula@lists.oasis-open.org" <office-formula@lists.oasis-open.org>
Date: Sun, 14 Mar 2010 18:46:51 -0400
As we've completed OpenFormula, we've noticed that there are a number of 
places were we have dependencies on the OpenFormula "host", the thing that 
the formula evaluator is embedded in.  Although the primary and immediate 
host is an ODF 1.2 spreadsheet application, it has been the intent, from 
the start, for OpenFormula not to depend directly on the definition of a 
spreadsheet.  The hope is that this formula language would be usable in 
other contexts as well.

We've also identified a number of areas where we've agreed that we can 
not/should not mandate the exact results.  For example, since we do not 
mandate specific numeric models, we do not mandate precision requirements 
for individual functions.  Nor do we mandate integer limits, character 
limits, etc.  However, some of our functions could be more rigorously 
defined if we could point to a limits definition, even if the limits were 
implementation-defined. 

I think we can address both of these issues in a single new clause, that 
clearly defines the parameters of an abstract evaluator.  In some cases 
these parameters will be implementation-defined, and in some cases the 
parameters will be defined by ODF 1.2 Part 1.  But if we can enumerate 
each parameter and give it a label, then we can refer to these evaluator 
parameters in our function definitions.

I'll give just an outline of what I mean.  I'm not wed to this approach, 
but I think something along these lines can improve the rigor of the 
existing text.

In Chapter 2 "Formula Processing Model" we insert a new section 2.2, which 
would say:

This standard defines the requirements for Formula Expression and Formula 
Evaluators.  An Formula Expression is a Unicode string which conforms the 
the requirements of chapter 4. 

A Formula Evaluator is a program that takes a Formula Expression as input, 
interprets the Formula Expression, and returns a  value. 

The requirements of a Formula Evaluator are defined in terms of an 
abstract machine which we term the Formula Evaluator Abstract Machine 
(FEAM).  A Formula Evaluator need not be implemented according to the 
details of this abstract machine, but it shall, in its external behaviors, 
conform to the stated requirements of a Formula Evaluator. 

The FEAM operates in an execution environment where it has access to 
Calculation Primitives, a Reference Resolver and a set of Evaluation 
Settings.

The FEAM's Calculation Primitives are:

the basic arithmetic operations of addition (+), subtraction (-), 
multiplication (*) and division (/)
the trigonometric functions of sine, cosine and tangent, as well as their 
inverses arcsine, arccosine and arctangent
evaluation of summations
numeric integration of a definite and indefinite integrals
date calculations using the proleptic Gregorian calendar, including day of 
week calculations, calculation of the difference between two dates, and 
calculating a date that is a constant number of days before or after a 
given date.

[Generally, we state all the requirements that we have on the Evaluator. 
All of our functions should then be defined to require only the primitives 
that we state here.  We might make use of ISO/IEC 10967 "Language 
Independent Arithmetic" http://en.wikipedia.org/wiki/ISO/IEC_10967

This forces us to acknowledge that we have mathematical notations in play 
here. We sort of already have two notations going, but we're not always 
clear.  I think the presentation of the definitions is improved if we are 
explicit in this.  We have the notation of the OpenFormula syntax and the 
notation of the Calculation Primitives.  In some cases there may be 
substantial overlap in notation.  But I think we must, via some 
typographical convention, make it absolutely clear which one we are 
referring to at any given time.]

The Reference Resolver takes as input  a Unicode string containing a 
Reference according to section 4.8 and returns a resolved value.  We 
express a invocation of the Reference Resolver in a function notation as:

REFERENCE-RESOLVER(Reference)

The following Calculation Settings are available to the FEAM:

DATE-BASE
DATE-LEAPYEAR-SKIP-1900
ZERO-POWER-ZERO
MAX-INT
MAX-NUMBER
[We probably have a few dozen like this.]

====================================================

If we do the above, then we should be able to avoid almost all 
implementation-defined under-defined functions in the text of OpenFormula. 
 The trick is this:  Although we cannot specify the details of the numeric 
model in the Evaluator, we can simply treat these as a priori defined 
Calculation Settings .  Note this is similar to how C treats numeric 
limits in <stdint.h>, <float.h> and <limits.h>.

So take the EVEN() function today, where Dennis observed that for large 
numbers, this function is meaningless, since the there might not be 
sufficient numeric significance when cast to an integer.  We could state 
that by saying in the definition of EVEN():

"If N > MAX-INT, the value returned by this function is undefined."

That's the main idea.  If we didn't have the ability to refer to MAX-INT 
by name, then we're limited to saying nothing, or making vague statements 
like "If N is larger than the largest integer which can be expressed in 
the given numeric processor....".  Best to encapsulate that important 
concept once, give it an label (MAX-INT) and then refer to it as needed.

(Note that we're not saying that MAX-INT can be queried at runtime.  It is 
purely a tool for clarifying the concepts in the specification.  Though we 
might consider making these be actual runtime entities in a future 
release)

In Part 1, we can then do the following:

1) Define the behavior of the Reference Resolver, namely recursive 
evaluation of cells, including treatment of out of range references and 
circular references.

2) Set the value of the Calculation Settings.  In some cases these will be 
set explicitly.  In most cases we can say that they are 
implementation-defined.  But I think there is some good to _not_ saying 
these are implementation-defined in Part 2.  If we avoid that, then we 
make it easier for others to use OpenFormula in a way that defines these 
more rigorously in a given context.

Note that this overall approach essentially encourages us to enumerate and 
label all implementation-dependent, implementation-defined and undefined 
behaviors and dependencies.  Aside from clarifying this specification, 
such information can be very useful on the conformance testing side, as 
well as be generally useful to anyone who wants to reuse OpenFormula in 
another context.
Follow-Ups:
- Re: [office-formula] Thoughts on OpenFormula, executationcontext, calculation settings, limits and how we put this all together.
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>