office-formula message

Subject: Re: [office-formula] Grammar

From: "David A. Wheeler" <dwheeler@dwheeler.com>
To: office-formula@lists.oasis-open.org
Date: Tue, 14 Mar 2006 22:02:54 -0500

I whispered:
>> 1. I think the syntax has to PERMIT empty parameters, because
>>    we have spreadsheet implementations where this is critical
>>    (and we want to be ABLE to read them).
>>     
>
>   
Eike Rathke ecrit:
> Seconded. However, we'll have to define for each function what an empty parameter at that position actually means.
>   

Agree.  I think we can state some sort of default. . In most cases it's 
not allowed
(maybe that's the default!), but when you need it you need it.

>> 2. Historically cell concatenation has been notated with the
>>    function separator (comma or semicolon).
>>     
>
> Actually, what implementations use in the UI, is that separator _and_
> the entire list enclosed in parentheses, e.g. =SUM((range1;range2)) is
> _one_ ReferenceList argument to the SUM function.
>   
I know, but that doesn't make it a good idea, and I notice that you 
agree...!

> I second that using some distinct operator in the file format would be
> easier to parse and less error prone. However, applications probably
> will still have to implement parsing the (...;...) notation for UI
> purposes because that's what people are used to, leading to duplicated
> effort. Just as a thought..
>
> Personally I'd prefer using an operator even in the UI, _because_ it is less error prone.
>   
Well, that's more likely if the operator is defined already.

In any case, if people want it that way in the UI that's fine (it's out 
of scope). But anything in the syntax that makes it more likely that 
there will be a miscommunication is a bad idea.  This is a trade-off 
between error detection/speed of reading, vs. making it look exactly 
like the current UI.  Either direction is defensible, the question is, 
which is best for the expected purpose?

>> 3. Rathke's grammar gets really complicated when it deals
>>    with referencelists, etc.  The problem is that it forbids
>>    some constructs that would make sense (e.g., you can't have
>>    cell concatenation outside a function call, even if you want
>>    to send the results to an operator).
>>     
>
> Is a ReferenceList anywhere used in that context? Anyway, being open to that possibility, even if not used nowadays, is actually a good thing.
>   

I believe in some implementations you
can have a formula that is ONLY a referencelist, for example:
=(A1:B2;C3)
You can then copy this formula around. This would cause
implicit intersection is then used to get the "correct" result in each cell.
And I bet several folks have done that, too. It's amazing what
people depend on.


>> 4. Cell column letters MUST be uppercase;
>>     
>
> Why? I don't see any other advantage in that except when doing diffs
> like you mentioned below.
>   

Doing diffs is a pretty good argument, yes?

>> I believe the OpenDocument spec specifically requires it.
>>     
>
> I didn't find anything that explicitly says so. Only the examples are
> kept in uppercase.
>   
See OpenDocument specification version 1.0, Section 8.3.1, "Referencing 
Table Cells".

Subsection "Absolute and relative cell addressing" defines "cellAddress" as:
        <param name="pattern">($?([^\. 
']+|'[^']+'))?\.$?[A-Z]+$?[0-9]+</param>
Note that it's only [A-Z]+.

Subsection "Cell Range Address" defines "cellRangeAddress" as:
        <param name="pattern">($?([^\. 
']+|'[^']+'))?\.$?[A-Z]+$?[0-9]+(:($?([^\. 
']+|'[^']+'))?\.$?[A-Z]+$?[0-9]+)?</param>
Again, only [A-Z]+.

Even the examples are all uppercase.

> relying on the XML stream being identical after a write back is.. well..
> maybe not appropriate.
Actually, once you run two stream through the same pretty-printer, it
actually makes sense.  There's a W3C specification specifically for this 
purpose,
and there's already work ongoing that does this with OpenDocument, see:
http://www2-data.informatik.unibw-muenchen.de/People/borghoff/pspapers/doceng2005.pdf
>  If we start being picky on this, there's a SHOULD
> write uppercase column names and function names, and MUST be able to
> read case insensitive.
>   
We could do that.  For function names I think that's fine.  For column 
names, that's a trade-off, and I worry about doing that. Allowing column 
names to be input in lower case makes some inputs okay (that would 
otherwise not be), but we lose the ability to detect some named 
expressions that unfortunately look like cell addresses.  It'd be nice 
if "Qtr3" was UNAMBIGUOUSLY a named expression; undetected errors in 
spreadsheets can be VERY dangerous.  Either choice is defensible, on the 
grounds that either works.

--- David A. Wheeler

Follow-Ups:
- Re: [office-formula] Grammar
  - From: Eike Rathke <erack@sun.com>

References:
- Grammar
  - From: Eike Rathke <erack@sun.com>
- Re: [office-formula] Grammar
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>
- Re: [office-formula] Grammar
  - From: Eike Rathke <erack@sun.com>
- Re: [office-formula] Grammar
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>
- Re: [office-formula] Grammar
  - From: Eike Rathke <erack@sun.com>