office-comment message

Subject: OpenFormula (OpenOFormula?)- I'm working on a draft.
From: "David A. Wheeler" <dwheeler@dwheeler.com>
To: ca@chbs.dk
Date: Mon, 07 Feb 2005 02:23:00 -0500
Comment from: ca@chbs.dk
> 
> ...OpenDocument doesn't specify the formulars used in spreadsheets so every spreadsheet vendor
 > can implement formulars in their own way without being an open standard.
> This way a vendor can create lock-in to their spreadsheets.

I think this is so important that I'm taking my January 4, 2005
proposal and reformatting it into a separate standalone specification
for formulas, so I can send it in as a proposal to the committee.
The committee did agree that they'd at least post some materials
on their website.  I would really like them to do more; I
hope they'll take my document, change it as they like,
and include it by reference as a normative requirement.

I'm calling my proposal "OpenFormula".  I'm starting with the spreadsheet
tables, but I see no reason to have 12 different formula processors
that are "almost alike" -- that will just confuse the user --
so I'm going to see if I can unify the whole thing.

This previous comment scares me: "There are from our point of
view also no interoperability issues, because the namespace prefix
mechanism we have specified unambiguously specifies what syntax and
semantics are used for a formula".  Here's how I read that:
"Every implementation must reverse engineer
all other implementations' namespaces (they're not in the spec,
so everyone's free to invent their own private incompatible
namespaces).  Then, every implementation must
implement all the syntax and semantics of all
other implementations' namespaces for formulas, if they
wish to achive interoperability.  And oh, by the way, your
implementation might not implement the namespace for the
document you're trying to load, so you may lose all the formulas."
I'm sure that's not what was meant, but that's how it
reads to me.  I hope that helps
explain why I think that the current formula information
in the OpenOffice specification is grossly inadequate.

My doc may get renamed to "OpenOFormula" (Open OpenDocument Formula)
or something else so that there are no name collisions worldwide.

Looking around at the OpenDocument specification, formulas
just haven't been handled well throughout -- and this is fixable
quickly if we hurry. There are completely independent
draw:formula and anim:formula specs which ARE specified
to more depth than the spreadsheet formulas.
The "Table Cell Content Validations" are also better-defined
than the actual formulas.
So you have a better shot at exchanging drawing formulas,
or exchanging validation rules for spreadsheet cells,
than for exchanging the actual equations for spreadsheets. Huh?

And there's also text:formula, which is sort of like
the table formulas except there's no requirement that
they be like each other.

Table:operator and table:condition use "!=" for not-equal,
but OpenOffice internally uses "<>" for not-equal.
No, that inconsistency is
not documented in the current specification, because the
entire formula syntax isn't really well-defined.

In short, the lack of a unified formula definition is a real
problem in the current spec, and it's too bad, because it's
not hard to at least make a first cut at fixing it so that
the syntax and basic operations (+, &, etc.) are defined.

In the longer term, it'll be very important to nail down
a long list of functions and the EXACT semantics of
everything (if you see a string when you were looking for
a number, what do you do?).  But let's at least get the
syntax down, and a short list of functions like SUM().
Many spreadsheets only use a few functions and simple
equations, and don't do things that would hit the edge cases.

Oh, a note about formula namespaces; please let me know
if my understanding here is incorrect.
Section 8.1.3 of Open Document Committee Draft 2, 21 Dec 2004,
says "Every formula should begin with a namespace prefix specifying the syntax 
and semantics used within the formula. Typically, the formula itself begins with 
an equal (=) sign and can include the following components..."
And 6.5.3 gives this example:
  text:formula='ooo-w:[address book file.address.FIRSTNAME] == "Julie"'
 From this, I presume that if the first character of a formula
is a letter, it must be a namespace defining
some specialized formula processing system (keep reading til the
colon to find the namespace's name; stuff after the colon is the formula).
If the first character is an "=" sign, then everything after that
is the formula, and the "default"
formula processing system is used.   I think there should be
a single unified formula processing system (subject to the
limits of security issues), so that "+" doesn't change its
meaning unless you use a nonstandard namespace.
I also think there should be a standard name for the namespace
(I suggest the standard prefix "formula:"), so that if you
want you can state things more clearly.

--- David A. Wheeler
Follow-Ups:
- Re: [office-comment] OpenFormula (OpenOFormula?)- I'm working ona draft.
  - From: Gary Edwards <garyedwards@yahoo.com>
References:
- Public Comment
  - From: comment-form@oasis-open.org