office-formula message

Subject: Re: [office-formula] Key Issues

From: robert_weir@us.ibm.com
To: office-formula@lists.oasis-open.org
Date: Mon, 27 Feb 2006 10:54:16 -0500

"David A. Wheeler" <dwheeler@dwheeler.com> wrote on 02/25/2006 02:59:59 PM: > * Should we use OpenFormula as a base specification? Wheeler > proposes yes; it's easier to start with a document and make changes > than to start with a blank page. We don't need to decide this at > the kickoff, but we need to decide soon, say by March 10. Between > now and then, please read the OpenFormula draft. The question isn't > whether or not you believe every word (it WILL change), but whether > or not starting with it is better than starting with a blank page. > Wheeler can give a brief overview of OpenFormula via the mailing > list. Are there any major questions about OpenFormula?

I certainly wouldn't start with a blank page. I think the submitted draft is an excellent starting point for our work.

> * Schedule: We need to define syntax, then semantics. Proposal: > Draft syntax defined by start+2 months; final syntax/semantics (inc. > function definitions) defined by start+6 months. Should "start" be March 2?
> * Issue: Goals/range. Rob Weir and David A. Wheeler both want it to > support both limited resources (e.g., tiny PDAs) and massive > capabilities. Yet if everything is optional, no one will know what > is supported, and users won't have real interoperability. A > solution for this problem is to define "levels" (OpenFormula did > this). Do we agree that we should have multiple levels?

I'm a little bit uncomfortable with the concept of levels here. In my mind they are rather hierarchical and do not admit of uses which do not fit into the conventional prescribed uses. What one person or one industry may consider an advanced function may be a daily-user function for someone in another industry. Lumping them altogether into levels may not match anyone's real use. For example, advanced financial function and advanced math functions are not going to be used by the same user. But I could imagine a PDF-based implementation used by an optical engineer which needs that Bessel function but would waste a lot of memory if they had to fully implement Level 4.

On the other hand, I don't really have an alternative solution to this. But it is worth thinking outside-of-the-box on this a little. For example, how is the similar problem solved in other contexts? Some take a library-oriented approach. C/C++ defines libraries of related functions which are defined in the sade header, math.h, stdio.h, etc. In cases where you have a less-than-complete implementation of the language, I've often seen them just leave out a library entirely, like signal.h. Similarly with Java, being objected-oriented as it is, different "levels" are defined by what packages are included.

I wonder if an approach like this would work? Define "packages" with names like "basic", "database", "statistics", "finance", "math", "info", etc. An implementation could pick and choose what they support at the granularity of a package, but compliance doesn't require that they support any specific set of packages, other than perhaps "basic". Perhaps you could also make clear, via namespaces, which functions are within which package.
> ** If we have levels, must define the levels. Discuss briefly > whatwe want, what are the levels. Can we try to create rough > resolution by March 17 on the levels, if we agree to define levels? > (OpenFormula had four levels; we don't need to have 4, or use its > breakdown, unless we like it.)

See above.
> * Scope: Define this specification as ONLY an interchange format, > and at most RECOMMEND user interface issues? Wheeler recommends > defining the spec as ONLY an interchange format. Spreadsheets vary > widely in user interfaces: parameter separator (comma vs. > semicolon), function names (displayed name often varies by locale), > number syntax (what locale?), equal-to operator (= or ==), > intersection operator (" " or "!"), and so on. The key is data > interchange, not presentation, so Wheeler thinks we should work on > defining how it's EXCHANGED as the scope.

Agreed. It would be nice if implementations agreed on the UI format for formulas, but that is outside the scope of this specification. In theory, an implementation could dispense with conventional formulas strings altogether and have a visual formula design language.
> * Test cases: Should we include test cases in the spec? Wheeler > STRONGLY recommends it. Including test cases eliminates many > problems of ambiguity in the text; Wheeler believes it is VERY > difficult to write unambiguous text, but that well-written text > accompanied by some clear test cases can together create an > unambiguous specification that is easier to create and to read. In > addition, including test cases makes it much easier to test and > assure compliance in implementations. OpenFormula did this > successfully, and the KSpread developers seemed to find the test cases useful.

I think it is also very difficult to write good, comprehensive test cases. So we have our work cut out for us either way.

Are you seeing the test cases as being normative? Keep in mind that OASIS does require that this specification be released in HTML and PDF format, so the normative specification is required to be textual, i.e., printable. But interlinear test cases, as text in the spec, is a great idea.

> * Discuss use of Wiki. Do we want to try to put stuff in a Wiki and > LATER transition text to ODF? Transition to an ODF document NOW? > Transition some text now (e.g., what's in OpenFormula), use Wiki, > and transition incrementally? One issue: The Wiki must be MoinMoin, > and it's unclear if OASIS will install the MoinMoin math formula > support. Without formula support, formulas may be harder to create.

I think areas where there is a consensus on the technical content should be migrated into ODF format and edited there. But areas which are undergoing a more rapid rate of change are better off it the Wiki.
> * Syntax. All ODF-supporting spreadsheets support the basic > OpenOffice.org syntax, e.g., [.B4]. Wheeler proposes that we use > the OpenOffice.org syntax as a starting point; OpenFormula did this. > However, we may need to add to the syntax as necessary (e.g., to > support the cell union infix operator, empty parameters, or inline arrays).

What's the take on R1C1 versus A1 style addressing? OO does it one way, but I was under the impression that performance suffers because of this. Am I misinformed? Is this something we need to consider?
> * Semantics. How strongly should we constrain semantics, and how > should we determine them? For example, different applications use > different rules for automatic conversion from text to number (e.g., > "3"+2). Some want very specific semantics defined; others want it > looser. OpenFormula split the difference by allowing much variance > at levels 1 and 2, but having stricter semantics at 3. We could > avoid it (leave some semantics undefined or loosely defined), use > levels, create a separate category ("strict" vs. "loose" semantics), > or something else.

I'm opposed to syntactic or semantic differences between levels. I see no reason why a PDA which might have a subset of functions would expect a different syntax. Remember, the PDA user may be the same user who later synchs up with a desktop computer at the end of the day. The user would expect type promotion, string conversions, etc., to work the same in both places.

Something else to throw in here, for lack of a better place -- errors. I think we need to think this out carefully. Even though the UI's may only show ERR or a small number of error strings, we should consider having letting our processing model have support for the full range of IEEE floating point special values like NaN, Inf, -Inf, etc. Doing this will require that we define behavior of our functions when taking these values as parameters as well. XPath, for example, does this.
> * Complex Numbers. We don't have time to go into it at the > beginning of this process, but although many implementations support > complex numbers, their support is really HORRIBLE to users. Later > on we'll need to determine what we should do about them. >

Perhaps not for version 1.0. Though adding support for complex numbers and even things like interval arithmetic is an interesting idea. But I wouldn't expect either of these things to be lead from a specification. Better to get someone to go out there and make a good implementation, demonstrate something that really works well, and then bring their requirements to OASIS for addition to the specification.

Follow-Ups:
- Re: [office-formula] Key Issues
  - From: "Tomas Mecir" <mecirt@gmail.com>

References:
- Key Issues
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>