office-formula message

Subject: Re: [office-formula] Syntax - recent changes, please look over..
From: Eike Rathke <erack@sun.com>
To: office-formula@lists.oasis-open.org
Date: Mon, 24 Apr 2006 21:36:21 +0200
Hi David,

On Sun, Apr 23, 2006 at 14:20:11 -0400, David A. Wheeler wrote:

> The syntax has stayed very stable; I think the primary
> reason is that the current syntax handles the "normal" cases
> quite nicely.  However, we need to handle ALL cases, even the
> truly weird ones.  Please take a look and kibitz/improve what's there!

> I've noted repeatedly that I want this DONE by May 2.

As already said, I doubt we'll have the entire syntax done by that date.
I have only limited time to spend right now, and with only you and me
working on it I don't think we'll have everything clarified by then.
Maybe we should first agree on the "normal" cases until then and leave
the ALL to be done later?

> I've made a few changes to the proposed syntax on the Wiki;
> please take a look, and see if you agree with them.

Regarding the #REF vs. #REF! change: should we really take care of
lex/flex-based lexers there? What about #REF.#REF#REF then? And how does
that fit with the Error values below?

Error value definitions:
IMHO #DIV/0 is not necessary as input, it can always be generated by
=1/0. Furthermore, as a formula =#DIV/0 could be ambiguous.

> Here are the changes, in a nutshell:
> * Restored the ability to use '$' in front of the column or row,
>    and added it to the sheetname too.  Somehow that got removed among
>    the other changes, it was there before.

Seems I removed the '$' from the SheetName when I cleaned out the
ASCII-only definition there.

> * Bare sheetnames can't include # or $ -- you have to enclose them with '..'.
>    At the very least, it's a problem if they can START with those characters,
>    because then you can't tell if you have an error or non-relative sheet,
>    or a funny sheetname.  I'm thinking that perhaps we should be more
>    restrictive about bare sheetnames anyway, maybe just limit them to
>    Identifier characters.  In fact, that may be important for lexing (I'll see
>    about that soon). Thoughts, anyone?

Restricting bare unquoted sheet names to Identifier characters (like
defined by us) in general probably goes into the right direction.
However, this may clash with the definition of ODF 8.3.1, in "Absolute
and relative cell addressing", which does not restrict bare names in any
other way than [^\. '], which IMHO is not sufficient.


> * ":" (when outside [..]) is now a top-precedence operator,
>    to handle stuff like [.A1:B3]:[.X6].  This is required when the
>    cell ranges are NOT constants but instead are named expressions
>    or function results (e.g., MYFUNC1():MYFUNC2()).

Ok. That made me thinking about the precedence of Range, Union and
Intersection, which we currently define in that order (btw, I think we
should write the table in the opposite order for clarity on 'highest'
and 'lowest' priority). Several sources about Excel (e.g.
http://support.microsoft.com/kb/25189/EN-US/ ) claim it to be Range,
Intersection, Union instead. Which, given that their Union operator is
the ',' comma operator, same as the function parameter separator,
somehow makes sense. It needs to try out some combinations to have that
verified, which I didn't yet.


> * I hooked in "AutomaticIntersection" so it could actually be used.
>    It had been defined earlier by someone else, but was never used.

Erm, pardon? I didn't get that.. hooked in where? I don't see any
related change in the diffs. On the other hand that reminds me that
I didn't finish that section yet and the implicit intersection is yet to
come..


> * I inserted an Array syntax.  Think of this more as a stub... we need one.
>    Comments there, or improvements, would be ESPECIALLY appreciated.

First: is there any application that allows Expressions instead of
constants in inline-arrays? I think supporting Expressions there would
overcomplicate things.


> The current syntax for AutomaticIntersection is very peculiar.
> It means that this is a string:
>   "Hello"
> But single quotes means it's going to be specially interpreted as
> a reference to a value identifying a row or column:
>   'Hamburg' !! 'Sales'
> So there's a SERIOUS difference between ' and ".

Sure, a string is a literal string, and a quoted identifier is not ;-)

> Eike, can you explain this?  I'd _prefer_ having just String representation,
> and then using !! as yet another operator, so it'd look like this:
>   "Hamburg" !! "Sales"
> Can anyone help me understand why that is a bad idea?

The easiest is an example without the intersection operator, an implicit
intersection: given a column of values labeled XXX on top of all values,
placing the formula ='xxx' somewhere beneath that column displays the
value of the very same row. Writing that as a string ="xxx" would of
course display the literal string xxx instead.

A literal string, after having possibly eliminated one of two duplicated
quote characters for literal quotes, should never need any further
processing before the formula is interpreted and calculated.
"Hamburg"!!"Sales" would violate that, whereas 'Hamburg'!!'Sales' fits
perfectly well into other addressing schemes like sheet names and
external sources. This makes the difference between quoted literal
strings and quoted identifiers.


> I plan to try to convert this into a flex/bison implementation.
> I think it's important that the syntax be easy to implement without
> a lot of complex state manipulations using typical tools.

IMHO we should not base the syntax on available tools. Being tools
friendly is of course nice and tempting, but "sophisticated" features
may require solutions easy tools don't fit in..

  Eike

-- 
Automatic string conversions considered dangerous. They are the GOTO statements
of spreadsheets.  --Robert Weir on the OpenDocument formula subcommitee's list.
References:
- Re: [office-formula] Grammar
  - From: Eike Rathke <erack@sun.com>
- Re: [office-formula] Grammar
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>
- Syntax - recent changes, please look over..
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>