office-formula message

Subject: Re: [office-formula] Grammar

From: "David A. Wheeler" <dwheeler@dwheeler.com>
To: office-formula@lists.oasis-open.org
Date: Fri, 03 Mar 2006 15:00:09 -0500

The more I look at it, the more that I think that "simplest is best".

The OpenFormula grammar is slightly more complicated than it would
"normally" be, because it tries very hard to separate the levels.
The grammar Eike posted is happily very similar (yay!), but I think
it is also overly complex, e.g.,
data type mismatches are best handled by typing, not the syntax.
We can also handle whitespace separately, and not try to
create a large independent set of syntax for references.

Instead, I think a simple (single) grammar would be best, identifying
ALL operators (even ones that a particular implementation might not have),
would be best... and then separately explaining that obviously you don't 
need
to be able to parse the syntax of an operator you don't support at all
(the result is the same no matter what... an error).  I think we're better
off depending on type mismatches, rather than syntax, to detect type
mismatches... the result should be REALLY simple, regular grammar
(to the relief of parser writers everywhere).

The result should be simpler than either OpenFormula or the recently-
posted grammar.

One issue: neither really specifically identifies the NLF intersection
operator.  However, "!" is the cell intersection operator for ranges
in OpenFormula, e.g.:
  [.A1:.G20]![.A1:.Z2]
I don't see why NLF intersection needs a syntactically separate
operator.  It's the same operation (intersection).  One approach:
if you have NLF, that means that you just predefine the relevant
"NameExpression"s from row and column labels, if you actually
want to use NLF... then when you
use a named expression, it "just works", e.g.:
   Summer ! Sales
Hate that? Another approach would be to use string names, e.g.,
intersection just happens to
accept two different types: string names and cell references, like this:
  "Summer" ! "Total Sales"
I like the NamedExpression approach better, though.
In my mind, that's just an option on the spreadsheet that has the
side-effect of automatically maintaining a set of NamedExpressions
based on the labels of the top row and left column.
Is there a reason that wouldn't work?

The "second =" marker is an interesting approach.
It appears to be the "opposite" approach of
having a set of volatile functions,
and in fact the two approaches should complement
each other nicely.  I guess I don't fundamentally
object to it, if it seems useful, though it's clearly
only for very advanced users.
I'm not sure how exactly to specify its semantics, though.


> Yes. In fact the cell "union" is not a union, but a list instead. A union would unify overlapping ranges, the list does not. (A1:A3;A2) evaluates A2 twice.

Yes, I even noted that in the OpenFormula text.  I don't like the term "union", it's very misleading, but that seems to be what people call it.  I originally called it "reference concatenation" but no one knew what that meant...!  The problem with "list" is that there are many other kinds of lists.  Any other names?

I don't think we need to support more built-in types for constants in the SYNTAX (except _maybe_ complex numbers).  Most types should derivable from Number, String, and logical values, when sent through functions.  E.G., dates are derivable from DATE.  We can still support arbitrary types... you just need a function call syntax to get them.

--- David A. Wheeler

Follow-Ups:
- Re: [office-formula] Grammar
  - From: Eike Rathke <erack@sun.com>

References:
- Grammar
  - From: Eike Rathke <erack@sun.com>
- Re: [office-formula] Grammar
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>
- Re: [office-formula] Grammar
  - From: Eike Rathke <erack@sun.com>