OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office-formula message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Grammar


Here's some grammar I developed having the capabilities of Excel and
OOoCalc in mind. Most things are similar to OpenFormula, some are
a little bit different. If we had a wiki I'd placed it there..


Grammar used

The grammar uses the same EBNF notation as
[http://www.w3.org/TR/REC-xml/#sec-notation  XML], with the exception
that grammar symbols always have initial capital letters. This makes
them clearly recognizable as names in unformatted text. Expressions are
parsed by first dividing the character into tokens and then parsing the
resulting sequence of tokens. Whitespace #x20 can be freely used between
expressions for readability.

Definition of the Formula Attribute

FormulaContent          ::=     Namespace Formula

Namespace               ::=     Namespace_in_XML ':'
Namespace_in_XML        ::=     http://www.w3.org/TR/REC-xml-names

The namespace tells the reading application how to treat formula content
written by a specific application or an application conforming to
a certain dialect of this formula language. For OpenDocument files
written by OpenOffice.org versions prior to this specification (versions
2.0, 2.0.1, 2.0.2, 2.0.3, ...?) it is

TODO: definition of namespace, suggestion: xmlns:odff="..."

Formula                 ::=     '=' '='? S* Expression S* Expression*

If a second '=' is present, the formula has to be recalculated whenever
one of its predecessors changes value. This can be used to force formula
cells to be recalculated that contain calls to macros or AddIns with
side effects. If no second '=' is present, the cell can be recalculated
at any time when needed.

WhiteSpace (S)
S                       ::=     #x20

Expression              ::=     Number |
                                String |
                                Array |
                                PrefixOp S* (Expression - String) |
                                (Expression - String) S* PostfixOp |
                                Expression S* InfixOp S* Expression |
                                '(' S* Expression S* ')' |
                                FunctionName S* '(' S* ParameterList? S* ')' |
                                Reference |

Number                  ::=     [0-9]+ ('.' [0-9]+)? ([eE] [-+]? [0-9]+)?

According to the "C" or en-US locale, the '.' dot is used as the decimal
separator, group (AKA thousand) separators are not written. "E" or "e"
denote scientific notation. It is advisable that readers are able to
read a fraction that starts with '.' without a leading zero, as there
may be implementations that don't write a leading zero on such numbers.

String                  ::=     '"' ([^"#x00-#x1f] | '""')* '"'

A literal double-quote character (") as string content is escaped by
duplicating it. All content is UTF-8 encoded.
Note that since the formula is stored as an XML attribute, all
double-quotes are written as their entity "

Array                   ::= TODO, which separators?

PrefixOp                ::=     '+' | '-'

Unary operators.

PostfixOp               ::=     '%'

Unary percentage operator, dividing the preceding expression by 100.

InfixOp                 ::=     ArithmeticOp | ComparisonOp | '&'

The '&' ampersand is the string concatenation operator.
Note that since the formula is stored as an XML attribute, an '&'
ampersand is written as the entity &

ArithmeticOp            ::=     '+' | '-' | '*' | '/' | '^'

Addition, Substraction, Multiplication, Division, Exponentiation.

ComparisonOp            ::=     '=' | '<>' | '<' | '>' | '<=' | '>='

EqualTo, UnequalTo, LessThan, GreaterThan, LessThanOrEqualTo,

FunctionName            ::=     Identifier

Note that in practice a FunctionName normally is stored using its
English form and not the translated UI representation, thus conforms to
[A-Za-z] [A-Za-z0-9_.]*
However, in theory all letter characters as defined for an Identifier are allowed.

Identifier              ::=     LetterXML (LetterXML | DigitXML | '_' | '.' |
LetterXML               ::=     http://www.w3.org/TR/REC-xml/#NT-Letter
DigitXML                ::=     http://www.w3.org/TR/REC-xml/#NT-Digit
CombiningCharXML        ::=     http://www.w3.org/TR/REC-xml/#NT-CombiningChar

ParameterList           ::=     Parameter ( S* ';' S* Parameter )*

Parameter               ::=     Expression | ReferenceList

ReferenceList           ::=     '(' S* Reference ( S* ';' S* Reference )* S* ')'

A ReferenceList as one argument is only accepted by spreadsheet
functions that handle a cell range at this parameter place.

Reference               ::=     CellReference |
                                RangeReference |
                                Intersection |
                                ColumnLable |

Intersection            ::=     Reference S* '!' S* Reference |
                                ColumnLable S+ RowLable |
                                RowLable S+ ColumnLable

ColumnLable             ::=     TODO

RowLabel                ::=     TODO

RangeReference          ::=     CellReference ':' CellReference |
                                '[' RangeAddress ']' |
        TODO: whitespace if range operator with name,
        but no whitespace if with cell addresses.

RangeAddress            ::=     CellAddress ':' CellAddress

CellReference           ::=     '[' CellAddress ']' | NamedCellReference

CellAddress             ::=     SheetName? '.' ColumnName RowNumber

If a CellAddress points to a sheet or column or row that got deleted,
the corresponding part of the address is set to '#REF' instead. For
example #REF.A1 was referring a now deleted sheet.

ColumnName              ::=     [a-zA-Z]+

Column names are A..Z, AA..ZZ, AAA..ZZZ, ...

RowNumber               ::=     [0-9]+

SheetName               ::=     TODO, including external reference

NamedExpression         ::=     NameIdentifier

A NamedExpression can contain any Expression valid in formulas.
TODO: exceptions?

NameIdentifier          ::=     Identifier - CellAddress - RangeAddress

NamedReference          ::=     NamedCellReference | NamedRangeReference

NamedCellReference      ::=     NameIdentifier

A NamedCellReference is a special case of a NamedExpression and contains
only one single CellReference.

NamedRangeReference     ::=     NameIdentifier

A NamedRangeReference is a special case of a NamedExpression and
contains only one single RangeReference.
TODO: operator precedence

Data Types in Parameters and Return Values

NumericValue            ::= Number

DateSerial              ::= NumericValue

A DateSerial is the number of days since a given base date. Time is
expressed in fractions of a day, 12 hours == 0.5 days. The base date is
specified in [TODO: ODF reference]

NumericArgument         ::= NumericValue | Reference

TODO: more definitions

Spreadsheet Functions

SUM( Arg1 [; Arg2]... )
Arg1 .. Arg#: NumericArgument
Result: NumericValue

The SUM function sums the values of all arguments. Arguments must be of
type NumericArgument. In case of a Reference,  the referred cells
containing numeric values are summed. If an argument is not
a NumericValue or refers a cell that has no numeric content, the
behavior is implementation dependent.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]