office-formula message

Subject: Re: [office-formula] Thoughts on OpenFormula, executationcontext, calculation settings, limits and how we put this all together.
From: "David A. Wheeler" <dwheeler@dwheeler.com>
To: office-formula@lists.oasis-open.org
Date: Mon, 15 Mar 2010 13:56:17 -0400 (EDT)
Rob Weir:
> As we've completed OpenFormula, we've noticed that there are a number of 
> places were we have dependencies on the OpenFormula "host", the thing that 
> the formula evaluator is embedded in.

I'm not sure we need to separate the "evaluator" from the "host".
I'd rather consider it all just properties of the "evaluator".  As long as the evaluator
meets its specs, it doesn't matter what's in the "evaluator proper" vs. its "host".

>  Although the primary and immediate 
> host is an ODF 1.2 spreadsheet application, it has been the intent, from 
> the start, for OpenFormula not to depend directly on the definition of a 
> spreadsheet.  The hope is that this formula language would be usable in 
> other contexts as well.

Completely agree.

> We've also identified a number of areas where we've agreed that we can 
> not/should not mandate the exact results.  For example, since we do not 
> mandate specific numeric models, we do not mandate precision requirements 
> for individual functions.  Nor do we mandate integer limits, character 
> limits, etc.  However, some of our functions could be more rigorously 
> defined if we could point to a limits definition, even if the limits were 
> implementation-defined. 

If we can do it soon, great.  If we get stuck, though, I think this would be a
reasonable thing to put off to a future release.

> I think we can address both of these issues in a single new clause, that 
> clearly defines the parameters of an abstract evaluator.  In some cases 
> these parameters will be implementation-defined, and in some cases the 
> parameters will be defined by ODF 1.2 Part 1.  But if we can enumerate 
> each parameter and give it a label, then we can refer to these evaluator 
> parameters in our function definitions.
> 
> I'll give just an outline of what I mean.  I'm not wed to this approach, 
> but I think something along these lines can improve the rigor of the 
> existing text.
> 
> In Chapter 2 "Formula Processing Model" we insert a new section 2.2, which 
> would say:
> 
> This standard defines the requirements for Formula Expression and Formula 
> Evaluators.  An Formula Expression is a Unicode string which conforms the 
> the requirements of chapter 4. 

I'd remove "Unicode", that's confusing in context, but otherwise fine.

> A Formula Evaluator is a program that takes a Formula Expression as input, 
> interprets the Formula Expression, and returns a  value.

Okay, that's reasonable.

> The requirements of a Formula Evaluator are defined in terms of an 
> abstract machine which we term the Formula Evaluator Abstract Machine 
> (FEAM).  A Formula Evaluator need not be implemented according to the 
> details of this abstract machine, but it shall, in its external behaviors, 
> conform to the stated requirements of a Formula Evaluator. 
> 
> The FEAM operates in an execution environment where it has access to 
> Calculation Primitives, a Reference Resolver and a set of Evaluation 
> Settings.
> 
> The FEAM's Calculation Primitives are:
> 
> the basic arithmetic operations of addition (+), subtraction (-), 
> multiplication (*) and division (/)
> the trigonometric functions of sine, cosine and tangent, as well as their 
> inverses arcsine, arccosine and arctangent
> evaluation of summations
> numeric integration of a definite and indefinite integrals
> date calculations using the proleptic Gregorian calendar, including day of 
> week calculations, calculation of the difference between two dates, and 
> calculating a date that is a constant number of days before or after a 
> given date.

I'm not sure this is worth it.  We can just state the mathematical
properties we need *directly* for a given function... we don't really
need to appeal to a "FEAM".  And what if an implementation does BETTER
than its "FEAM" for some functions?  Is that okay?

> [Generally, we state all the requirements that we have on the Evaluator. ...]

> All of our functions should then be defined to require only the primitives 
> that we state here...

There's no reason to do that.  I think it's reasonable to assume that readers
already know standard mathematical notation.

> This forces us to acknowledge that we have mathematical notations in play 
> here. We sort of already have two notations going, but we're not always 
> clear.  I think the presentation of the definitions is improved if we are 
> explicit in this.  We have the notation of the OpenFormula syntax and the 
> notation of the Calculation Primitives.  In some cases there may be 
> substantial overlap in notation.  But I think we must, via some 
> typographical convention, make it absolutely clear which one we are 
> referring to at any given time.]

Now here I agree.  We *do* have two notations for defining functions:
* In some cases, we define OpenFormula functions in terms of other OpenFormula
functions.  That's okay as long as there is no loop (!).  These are uppercase fixed-width.
* Standard mathematical notation.

I think we *should* clearly differentiate the two.  I don't know what to call them,
or how to more clearly distinguish them... suggestions?

...
> If we do the above, then we should be able to avoid almost all 
> implementation-defined under-defined functions in the text of OpenFormula. 
>  The trick is this:  Although we cannot specify the details of the numeric 
> model in the Evaluator, we can simply treat these as a priori defined 
> Calculation Settings .  Note this is similar to how C treats numeric 
> limits in <stdint.h>, <float.h> and <limits.h>.

There's a lot you CAN'T say.
For example, given some BIGNUM, will adding SMALLNUM
produce a different number?  A different number, but with loss of precision?
Even given IEEE 754, you have varying bases, and we're not trying to limit things to that.

> So take the EVEN() function today, where Dennis observed that for large 
> numbers, this function is meaningless, since the there might not be 
> sufficient numeric significance when cast to an integer.  We could state 
> that by saying in the definition of EVEN():
> 
> "If N > MAX-INT, the value returned by this function is undefined."
> 
> That's the main idea.  If we didn't have the ability to refer to MAX-INT 
> by name, then we're limited to saying nothing, or making vague statements 
> like "If N is larger than the largest integer which can be expressed in 
> the given numeric processor....".  Best to encapsulate that important 
> concept once, give it an label (MAX-INT) and then refer to it as needed.
> 
> (Note that we're not saying that MAX-INT can be queried at runtime.  It is 
> purely a tool for clarifying the concepts in the specification.  Though we 
> might consider making these be actual runtime entities in a future 
> release)

If you wanted to go *this* way, you could go further, e.g., saying that
MAX_INT must be at least some value.

> In Part 1, we can then do the following:
> 
> 1) Define the behavior of the Reference Resolver, namely recursive 
> evaluation of cells, including treatment of out of range references and 
> circular references.
> 
> 2) Set the value of the Calculation Settings.  In some cases these will be 
> set explicitly.  In most cases we can say that they are 
> implementation-defined.  But I think there is some good to _not_ saying 
> these are implementation-defined in Part 2.  If we avoid that, then we 
> make it easier for others to use OpenFormula in a way that defines these 
> more rigorously in a given context.
> 
> Note that this overall approach essentially encourages us to enumerate and 
> label all implementation-dependent, implementation-defined and undefined 
> behaviors and dependencies.  Aside from clarifying this specification, 
> such information can be very useful on the conformance testing side, as 
> well as be generally useful to anyone who wants to reuse OpenFormula in 
> another context.


I fear trying to add such a model at this late stage will hold us up endlessly.
I'd rather push that off to a future version.

HOWEVER, I think improving the description of the evaluator (sans FEAM),
and clarifying which notation is which in each function, is a good idea,
and one we could do now.

--- David A. Wheeler
References:
- Thoughts on OpenFormula, executation context, calculation settings, limitsand how we put this all together.
  - From: robert_weir@us.ibm.com