[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Hi, Here's some grammar I developed having the capabilities of Excel and OOoCalc in mind. Most things are similar to OpenFormula, some are a little bit different. If we had a wiki I'd placed it there.. Eike Grammar used The grammar uses the same EBNF notation as [http://www.w3.org/TR/REC-xml/#sec-notation XML], with the exception that grammar symbols always have initial capital letters. This makes them clearly recognizable as names in unformatted text. Expressions are parsed by first dividing the character into tokens and then parsing the resulting sequence of tokens. Whitespace #x20 can be freely used between expressions for readability. Definition of the Formula Attribute FormulaContent ::= Namespace Formula Namespace ::= Namespace_in_XML ':' Namespace_in_XML ::= http://www.w3.org/TR/REC-xml-names The namespace tells the reading application how to treat formula content written by a specific application or an application conforming to a certain dialect of this formula language. For OpenDocument files written by OpenOffice.org versions prior to this specification (versions 2.0, 2.0.1, 2.0.2, 2.0.3, ...?) it is xmlns:oooc="http://openoffice.org/2004/calc" TODO: definition of namespace, suggestion: xmlns:odff="..." Formula ::= '=' '='? S* Expression S* Expression* If a second '=' is present, the formula has to be recalculated whenever one of its predecessors changes value. This can be used to force formula cells to be recalculated that contain calls to macros or AddIns with side effects. If no second '=' is present, the cell can be recalculated at any time when needed. WhiteSpace (S) S ::= #x20 Expression ::= Number | String | Array | PrefixOp S* (Expression - String) | (Expression - String) S* PostfixOp | Expression S* InfixOp S* Expression | '(' S* Expression S* ')' | FunctionName S* '(' S* ParameterList? S* ')' | Reference | NamedExpression Number ::= [0-9]+ ('.' [0-9]+)? ([eE] [-+]? [0-9]+)? According to the "C" or en-US locale, the '.' dot is used as the decimal separator, group (AKA thousand) separators are not written. "E" or "e" denote scientific notation. It is advisable that readers are able to read a fraction that starts with '.' without a leading zero, as there may be implementations that don't write a leading zero on such numbers. String ::= '"' ([^"#x00-#x1f] | '""')* '"' A literal double-quote character (") as string content is escaped by duplicating it. All content is UTF-8 encoded. Note that since the formula is stored as an XML attribute, all double-quotes are written as their entity " Array ::= TODO, which separators? PrefixOp ::= '+' | '-' Unary operators. PostfixOp ::= '%' Unary percentage operator, dividing the preceding expression by 100. InfixOp ::= ArithmeticOp | ComparisonOp | '&' The '&' ampersand is the string concatenation operator. Note that since the formula is stored as an XML attribute, an '&' ampersand is written as the entity & ArithmeticOp ::= '+' | '-' | '*' | '/' | '^' Addition, Substraction, Multiplication, Division, Exponentiation. ComparisonOp ::= '=' | '<>' | '<' | '>' | '<=' | '>=' EqualTo, UnequalTo, LessThan, GreaterThan, LessThanOrEqualTo, GreaterThanOrEqualTo. FunctionName ::= Identifier Note that in practice a FunctionName normally is stored using its English form and not the translated UI representation, thus conforms to [A-Za-z] [A-Za-z0-9_.]* However, in theory all letter characters as defined for an Identifier are allowed. Identifier ::= LetterXML (LetterXML | DigitXML | '_' | '.' | CombiningCharXML)* LetterXML ::= http://www.w3.org/TR/REC-xml/#NT-Letter DigitXML ::= http://www.w3.org/TR/REC-xml/#NT-Digit CombiningCharXML ::= http://www.w3.org/TR/REC-xml/#NT-CombiningChar ParameterList ::= Parameter ( S* ';' S* Parameter )* Parameter ::= Expression | ReferenceList ReferenceList ::= '(' S* Reference ( S* ';' S* Reference )* S* ')' A ReferenceList as one argument is only accepted by spreadsheet functions that handle a cell range at this parameter place. Reference ::= CellReference | RangeReference | Intersection | ColumnLable | RowLable Intersection ::= Reference S* '!' S* Reference | ColumnLable S+ RowLable | RowLable S+ ColumnLable ColumnLable ::= TODO RowLabel ::= TODO RangeReference ::= CellReference ':' CellReference | '[' RangeAddress ']' | NamedRangeReference TODO: whitespace if range operator with name, but no whitespace if with cell addresses. RangeAddress ::= CellAddress ':' CellAddress CellReference ::= '[' CellAddress ']' | NamedCellReference CellAddress ::= SheetName? '.' ColumnName RowNumber If a CellAddress points to a sheet or column or row that got deleted, the corresponding part of the address is set to '#REF' instead. For example #REF.A1 was referring a now deleted sheet. ColumnName ::= [a-zA-Z]+ Column names are A..Z, AA..ZZ, AAA..ZZZ, ... RowNumber ::= [0-9]+ SheetName ::= TODO, including external reference NamedExpression ::= NameIdentifier A NamedExpression can contain any Expression valid in formulas. TODO: exceptions? NameIdentifier ::= Identifier - CellAddress - RangeAddress NamedReference ::= NamedCellReference | NamedRangeReference NamedCellReference ::= NameIdentifier A NamedCellReference is a special case of a NamedExpression and contains only one single CellReference. NamedRangeReference ::= NameIdentifier A NamedRangeReference is a special case of a NamedExpression and contains only one single RangeReference. TODO: operator precedence Data Types in Parameters and Return Values NumericValue ::= Number DateSerial ::= NumericValue A DateSerial is the number of days since a given base date. Time is expressed in fractions of a day, 12 hours == 0.5 days. The base date is specified in [TODO: ODF reference] NumericArgument ::= NumericValue | Reference TODO: more definitions Spreadsheet Functions SUM( Arg1 [; Arg2]... ) Arg1 .. Arg#: NumericArgument Result: NumericValue The SUM function sums the values of all arguments. Arguments must be of type NumericArgument. In case of a Reference, the referred cells containing numeric values are summed. If an argument is not a NumericValue or refers a cell that has no numeric content, the behavior is implementation dependent.