Subject: Thoughts on OpenFormula, executation context, calculation settings, limitsand how we put this all together.
As we've completed OpenFormula, we've noticed that there are a number of places were we have dependencies on the OpenFormula "host", the thing that the formula evaluator is embedded in. Although the primary and immediate host is an ODF 1.2 spreadsheet application, it has been the intent, from the start, for OpenFormula not to depend directly on the definition of a spreadsheet. The hope is that this formula language would be usable in other contexts as well. We've also identified a number of areas where we've agreed that we can not/should not mandate the exact results. For example, since we do not mandate specific numeric models, we do not mandate precision requirements for individual functions. Nor do we mandate integer limits, character limits, etc. However, some of our functions could be more rigorously defined if we could point to a limits definition, even if the limits were implementation-defined. I think we can address both of these issues in a single new clause, that clearly defines the parameters of an abstract evaluator. In some cases these parameters will be implementation-defined, and in some cases the parameters will be defined by ODF 1.2 Part 1. But if we can enumerate each parameter and give it a label, then we can refer to these evaluator parameters in our function definitions. I'll give just an outline of what I mean. I'm not wed to this approach, but I think something along these lines can improve the rigor of the existing text. In Chapter 2 "Formula Processing Model" we insert a new section 2.2, which would say: This standard defines the requirements for Formula Expression and Formula Evaluators. An Formula Expression is a Unicode string which conforms the the requirements of chapter 4. A Formula Evaluator is a program that takes a Formula Expression as input, interprets the Formula Expression, and returns a value. The requirements of a Formula Evaluator are defined in terms of an abstract machine which we term the Formula Evaluator Abstract Machine (FEAM). A Formula Evaluator need not be implemented according to the details of this abstract machine, but it shall, in its external behaviors, conform to the stated requirements of a Formula Evaluator. The FEAM operates in an execution environment where it has access to Calculation Primitives, a Reference Resolver and a set of Evaluation Settings. The FEAM's Calculation Primitives are: the basic arithmetic operations of addition (+), subtraction (-), multiplication (*) and division (/) the trigonometric functions of sine, cosine and tangent, as well as their inverses arcsine, arccosine and arctangent evaluation of summations numeric integration of a definite and indefinite integrals date calculations using the proleptic Gregorian calendar, including day of week calculations, calculation of the difference between two dates, and calculating a date that is a constant number of days before or after a given date. [Generally, we state all the requirements that we have on the Evaluator. All of our functions should then be defined to require only the primitives that we state here. We might make use of ISO/IEC 10967 "Language Independent Arithmetic" http://en.wikipedia.org/wiki/ISO/IEC_10967 This forces us to acknowledge that we have mathematical notations in play here. We sort of already have two notations going, but we're not always clear. I think the presentation of the definitions is improved if we are explicit in this. We have the notation of the OpenFormula syntax and the notation of the Calculation Primitives. In some cases there may be substantial overlap in notation. But I think we must, via some typographical convention, make it absolutely clear which one we are referring to at any given time.] The Reference Resolver takes as input a Unicode string containing a Reference according to section 4.8 and returns a resolved value. We express a invocation of the Reference Resolver in a function notation as: REFERENCE-RESOLVER(Reference) The following Calculation Settings are available to the FEAM: DATE-BASE DATE-LEAPYEAR-SKIP-1900 ZERO-POWER-ZERO MAX-INT MAX-NUMBER [We probably have a few dozen like this.] ==================================================== If we do the above, then we should be able to avoid almost all implementation-defined under-defined functions in the text of OpenFormula. The trick is this: Although we cannot specify the details of the numeric model in the Evaluator, we can simply treat these as a priori defined Calculation Settings . Note this is similar to how C treats numeric limits in <stdint.h>, <float.h> and <limits.h>. So take the EVEN() function today, where Dennis observed that for large numbers, this function is meaningless, since the there might not be sufficient numeric significance when cast to an integer. We could state that by saying in the definition of EVEN(): "If N > MAX-INT, the value returned by this function is undefined." That's the main idea. If we didn't have the ability to refer to MAX-INT by name, then we're limited to saying nothing, or making vague statements like "If N is larger than the largest integer which can be expressed in the given numeric processor....". Best to encapsulate that important concept once, give it an label (MAX-INT) and then refer to it as needed. (Note that we're not saying that MAX-INT can be queried at runtime. It is purely a tool for clarifying the concepts in the specification. Though we might consider making these be actual runtime entities in a future release) In Part 1, we can then do the following: 1) Define the behavior of the Reference Resolver, namely recursive evaluation of cells, including treatment of out of range references and circular references. 2) Set the value of the Calculation Settings. In some cases these will be set explicitly. In most cases we can say that they are implementation-defined. But I think there is some good to _not_ saying these are implementation-defined in Part 2. If we avoid that, then we make it easier for others to use OpenFormula in a way that defines these more rigorously in a given context. Note that this overall approach essentially encourages us to enumerate and label all implementation-dependent, implementation-defined and undefined behaviors and dependencies. Aside from clarifying this specification, such information can be very useful on the conformance testing side, as well as be generally useful to anyone who wants to reuse OpenFormula in another context.