OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-formula message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [office-formula] Grammar


Hi David,

On Sun, Mar 12, 2006 at 19:54:47 -0500, David A. Wheeler wrote:

>  Formula                ::=     '=' ForceRecalc? Expression
>  ForceRecalc            ::=     '='
> 
> NOTE: I changed "Expression Expression*" to just "Expression" here.
> If you want a list of expressions, that should turn into a single
> expression anyway.
> 
> If the second '=' is present, this is a "forced recalculate" hint.
> Typical implementations optimize what is recalculated; they typically
> only recalculate what is displayed and what those values depend on
> (transitively).  A forced recalculate hints that this formula should be
> recalculated whenever recalculation takes place, even if it isnt'
> apparantly needed (e.g., due to side-effects of macros).

Not "whenever recalculation takes place", which would mean that each
time something is calculated also all forced formulas would have to be
calculated. Instead, a forced formula is recalculated whenever one of
its predecessors it depends on changed, not only if the result is
displayed or accessed.


>  Expression             ::=     Number |
>                                 String |
>                                 Array |

I think we need another type of constant: ErrorValue.
In some applications it is possible to enter =#NUM! or similar and
a corresponding error result is produced. Of course function results
also may deliver an ErrorValue, which in fact is propagated as an
argument to the next function in the calculation chain.


>  Array                   ::= TODO, which separators?

http://www.oasis-open.org/committees/download.php/16826/openformula-spec-20060221.html#Extension:_In-line_arrays

proposed semicolon ';' as the separator between elements of one vector
(or "row"), and pipe symbol '|' as the seperator between rows. Any other
opinions?


>  ReferenceOp             ::=  IntersectionOp | CellConcatenationOp | 
> RangeOp
>  IntersectionOp          ::=  '!'
>  CellConcatentationOp    ::=  '~'
>  RangeOp                 ::=  ':'
> 
> These are operations on references: Intersection (which some user
> interfaces display as a space), cell concatenation
> (which some user interfaces display as the function parameter seperator),
> and range extension (given two ends, creates a range that covers both ends).

Additional note: the result may be a 3D range, with front-upper-left and
back-lower-right corners.


>  Reference ::= '[' Source? RangeAddress ']'
>  RangeAddress ::=  CellAddress ( ':' CellAddress )? |
>                    SheetName "." ColumnLabel ':' SheetName "." 
> ColumnLabel |
>                    SheetName "." RowLabel ':' SheetName "." RowLabel
>  CellAddress ::= SheetName "." ColumnLabel RowLabel
>  SheetName   ::= /* empty */ | '$' ( [A-Za-z0-9]+ | ("'" ([^']|"''")+ 
> "'" ) )

The '$' does not make sense there. It would force every sheet reference
to be absolute.

I think we have to elaborate on the SheetName. A SheetName doesn't
necessarily have to be quoted as long as it doesn't contain dot '.' or
space ' ' or apostrophe "'". See also the ODF cellAddress definition in
8.3.1 Referencing Table Cells

Following that cellAddress definition would lead to:

SheetName   ::= [^\. ']+ | "'" [^']+ "'"

However, as you already pointed out in your definition above, embedded
apostrophes should be allowed if doubled, or otherwise escaped, e.g. \'

SheetName   ::= [^\. ']+ | "'" ([^'] | "''")+ "'"

would be more appropriate. Alas, this is currently not covered by ODF.
TODO: clarify with TC.

Excel doesn't allow a few certain characters in sheetnames, would had to
lookup which. Do we want to deprecate them for interoperability?

Furthermore, should we restrict the list of allowed characters to
LetterXML, DigitXML and such, like in an Identifier? Are control
characters allowed?


> The syntax above supports whole-row and whole-column refernces, but they
> must always have a range marker.
> Note that
> 
> TODO: Ideally we'd allow Sheetnames to not start with "$", but then
> if given ['...,  is that a start of a SOURCE or a SHEETNAME?

You don't know until you reach the end of the quote: 'abc'#'def'.A1 is
a Source followed by a CellAddress, 'abc'.A1 is a CellAddress.


> TODO: R1C1 and relative sheetnames for CellAddress?

Relative SheetName yes, absolutely ;-)  Otherwise it would break things.
R1C1 probably not, it isn't covered by ODF at all, and you noted that
writing different notations (A1 vs. R1C1) would hamper diffs.


> TODO: Errors.

What do you mean, errors in references?


> [... R1C1 ...]
> However, A1 notation is much easier for humans to
> understand, so using A1 format is likely to increase reliability
> (because it is more likely to be correctly generated and interpreted).
> If this is a serious problem, Array formulas can usually be used to
> achieve the same results.

Note that range references in repeated single formulas act different
from range references in array formulas regarding the implicit
intersection calculation, so that statement doesn't necessarily hold
true.


> *Rationale:* Supporting whole-rows and whole-columns is useful;
> Excel 2003 supports this. OpenOffice 2.0 does not directly
> support it, but it fakes it very well by translating a "whole row" or
> "whole column" entry into a specific range. As long as spreadsheets only
> go from A1 to IV65536, there is no difference, and OpenOffice.org 2.0's
> approach is probably fine for many people. However, spreadsheet
> implementations of the future are likely to support larger limits, at
> which point these spreadsheets will quietly fail. It would be wiser to
> directly support these selectors so that large spreadsheets of the
> future would not silently fail.

That probably needs a change of the ODF cellRangeAddress definition,
which currently does not allow constructs like A:A or 2:2
TODO: clarify with TC


>  NamedExpression         ::=     Source? Identifier
>  Source ::= "'" URI "'" "#"
> 
> A NamedExpression references another expression, possibly in a completely
> different spreadsheet.

Not only restricted to spreadsheet documents. It may at least reference
any document type that can be imported into a spreadsheet. Also other
protocols than file/http/ftp may be supported.


> TODO: In Excel, NamedExpressions can be sheet-specific or global, but
> there's no syntax to reference one or the other (a potential problem for
> external references in particular).  Should we do better?

You can't decide that on a formula-attribute basis. AFAIK current ODF
doesn't provide means to distinguish between global and sheet-local
expressions.


> The URI in the source_location MUST support absolute URIs (URLs are
> URIs). Implementations SHOULD support relative URLs,

I think that should be a MUST instead, otherwise you're not able to pass
on a set of related documents without forcing the reciever into
replicating the hierarchy of your file system..

> which can be
> distinguished because they do not begin with [A-Za-z]+ ":". If a
> relative URL looks like an absolute URL, it MUST save relative URLs
> using a "./" prefix, e.g., './mystuff.xls'.

Shouldn't that be more something like "a relative-path reference as
described in §5 of RFC2396" similar to ODF "17.5 Usage of URIs Within
Packages"?


> TODO: Discuss IRIs?

I think they have to be supported somehow. Wasn't there some discussion
about IRIs in the TC?

> TODO: exceptions?

Needs some more thoughts, I guess.. after vacation ;-)

I'm parting off here right now, though I'm only half way through..

Meet you again next month.

  Eike


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]