[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-formula] Grammar
Eike Rathke wrote: > Hi, > > Here's some grammar I developed having the capabilities of Excel and > OOoCalc in mind. Most things are similar to OpenFormula, some are > a little bit different. Thanks for posting this! Here are a few comments trying to compare this with OpenFormula. Hopefully we can combine the features of these grammars to create the world's best grammar :-). > If we had a wiki I'd placed it there.. > Understood :-(. I've sent an email asking for status-of-the-Wiki. > Definition of the Formula Attribute > > FormulaContent ::= Namespace Formula > In OpenFormula the namespace was considered external to the syntax, but that's really a presentational nit; the result is identical. Since ODF mandated this, that's not surprising :-). Note that there are LOTS of other places where ODF uses formula-like expressions; I'd like to be able to expand and cover those too, or at least make it easy for implementations to reuse code for them. > > Formula ::= '=' '='? S* Expression S* Expression* > > If a second '=' is present, the formula has to be recalculated whenever > one of its predecessors changes value. This can be used to force formula > cells to be recalculated that contain calls to macros or AddIns with > side effects. If no second '=' is present, the cell can be recalculated at any time when needed. > This second "=" is odd; I don't see the value of this capability. Could you enlighten me? In my mind, shouldn't implementations figure out when to recalculate, rather than trying to embed this in the syntax? In most implementations I'm aware of, formulas are normally ALWAYS recalculated if their predecessors change (assuming automatic recalc is on). But if you have manual recalc on, then this second "=" shouldn't have an effect anyway. For those few functions where figuring out the predecessor is "too hard" (aka the "volatile functions"), they'll always recalc, but that's a trade-off between implementation complexity and run-time effort that in my mind. If an implementor WANTS to write more code to be clever about reducing recalculation time, they should be allowed to do so. Contrast this with OpenFormula section 5.2: "Implementations of OpenFormula recalculate formulas when its information is needed. Typical implementations will note what values a formula depends on, and when those dependent values are changed, it will re-execute the formulas that depend on them to produce the new results (choosing the formulas in the right order based on their dependencies). Implementations may recalculate when a value changes (this is termed /automatic recalculation/) or on user command (this is termed /manual recalculation/)." "Some functions' dependencies are difficult to determine and/or should be recalculated more frequently. These include functions that return today's date or time, random number generator functions (such as RAND <http://www.openformula.org/RAND>), or ones that indirectly determine the cells to act on. Many implementations /always/ recalculate formulas including such functions whenever a recalculation occurs. Functions that are /always/ recalculated whenever a recalculation occurs are termed /volatile/ functions. Functions that are often volatile functions include AREAS <http://www.openformula.org/index.php?title=AREAS&action=edit>, CELL <http://www.openformula.org/index.php?title=CELL&action=edit>, COLUMNS <http://www.openformula.org/COLUMNS>, INDEX <http://www.openformula.org/INDEX>, INDIRECT <http://www.openformula.org/index.php?title=INDIRECT&action=edit>, NOW <http://www.openformula.org/NOW>, OFFSET <http://www.openformula.org/index.php?title=OFFSET&action=edit>, RAND <http://www.openformula.org/RAND>, ROWS <http://www.openformula.org/ROWS>, and TODAY <http://www.openformula.org/TODAY>." > > WhiteSpace (S) > S ::= #x20 > Many spreadsheet implementations, including Excel, also allow newline (\n). Obviously in an XML attribute newlines are converted into their XML-escaped form. > > Expression ::= Number | > String | > Array | > PrefixOp S* (Expression - String) | > (Expression - String) S* PostfixOp | > Expression S* InfixOp S* Expression | > '(' S* Expression S* ')' | > FunctionName S* '(' S* ParameterList? S* ')' | > Reference | > NamedExpression > OpenFormula's _presentation_ of spaces is different, but the result APPEARS to be the same. I disagree with the use of (Expression-String) everywhere, though. OpenFormula doesn't do that. Strings are expressions, and should be treated as expressions with data type string. If a particular OPERATOR doesn't like the string type, that's a data type error... not a syntax error. The OpenFormula syntax uses the term "formula_variable" instead of "NamedExpression", and in comparison I think that's a weakness of OpenFormula's naming. I think "NamedExpression" is a far more accurate name for that. > Number ::= [0-9]+ ('.' [0-9]+)? ([eE] [-+]? [0-9]+)? > > According to the "C" or en-US locale, the '.' dot is used as the decimal > separator, group (AKA thousand) separators are not written. "E" or "e" > denote scientific notation. It is advisable that readers are able to > read a fraction that starts with '.' without a leading zero, as there > may be implementations that don't write a leading zero on such numbers. > OpenFormula has the same idea. It requires the ability to READ leading-"." numbers, and includes the syntax. > > String ::= '"' ([^"#x00-#x1f] | '""')* '"' > > A literal double-quote character (") as string content is escaped by > duplicating it. All content is UTF-8 encoded. > Note that since the formula is stored as an XML attribute, all > double-quotes are written as their entity " > I don't think we want to mandate UTF-8 encoding... that's really the business of the enclosing XML. > Array ::= TODO, which separators? > OpenFormula does this: "Here semicolon is used as the separator between values in a row (again, so different locales will have a simpler time entering data when entering data), and the pipe symbol "|" is proposed as the symbol separating rows (with absolutely no precedent)." It also supports multi-arrays with "_"... I don't remember that being IN there, so it probably should come out :-). > PrefixOp ::= '+' | '-' > > Unary operators. > OpenFormula calls these "unary_op"... they should probably be called "PrefixOp" since a postfix op is ALSO a unary op. > > PostfixOp ::= '%' > > Unary percentage operator, dividing the preceding expression by 100. > Yup. Everyone seems to do it this way. > InfixOp ::= ArithmeticOp | ComparisonOp | '&' > > The '&' ampersand is the string concatenation operator. > Note that since the formula is stored as an XML attribute, an '&' > ampersand is written as the entity & > This treats cell intersection as NOT an infix operator, which I think is suboptimal. By trying to separate the cell intersection operator syntactically, precedence is dealt with nonuniformly. > > ArithmeticOp ::= '+' | '-' | '*' | '/' | '^' > > Addition, Substraction, Multiplication, Division, Exponentiation. > > > ComparisonOp ::= '=' | '<>' | '<' | '>' | '<=' | '>=' > > EqualTo, UnequalTo, LessThan, GreaterThan, LessThanOrEqualTo, > GreaterThanOrEqualTo. > > > FunctionName ::= Identifier > > Note that in practice a FunctionName normally is stored using its > English form and not the translated UI representation, thus conforms to > [A-Za-z] [A-Za-z0-9_.]* > However, in theory all letter characters as defined for an Identifier are allowed. > > > Identifier ::= LetterXML (LetterXML | DigitXML | '_' | '.' | > CombiningCharXML)* > LetterXML ::= http://www.w3.org/TR/REC-xml/#NT-Letter > DigitXML ::= http://www.w3.org/TR/REC-xml/#NT-Digit > CombiningCharXML ::= http://www.w3.org/TR/REC-xml/#NT-CombiningChar > > ParameterList ::= Parameter ( S* ';' S* Parameter )* > > Parameter ::= Expression | ReferenceList > Why this distinction? Shouldn't a "referenceList" be an expression? > ReferenceList ::= '(' S* Reference ( S* ';' S* Reference )* S* ')' > > A ReferenceList as one argument is only accepted by spreadsheet > functions that handle a cell range at this parameter place. > Is this a "cell union"? If so, the double-use of ";" to separate function parameters AND to make a union is awkward. > > Reference ::= CellReference | > RangeReference | > Intersection | > ColumnLable | > RowLable > > Intersection ::= Reference S* '!' S* Reference | > ColumnLable S+ RowLable | > RowLable S+ ColumnLable > Here the syntax treats "!" specially. I think it'd be easier to keep the syntax uniform if we just treat "!" as yet-another-operator that happens to work on references. Otherwise, it'll be easy to create a syntax that duplicates all sorts of crazy things. > ColumnLable ::= TODO > > RowLabel ::= TODO > > RangeReference ::= CellReference ':' CellReference | > '[' RangeAddress ']' | > NamedRangeReference > TODO: whitespace if range operator with name, > but no whitespace if with cell addresses. > Interesting. Like OpenFormula, this means both [.A1:.A2] and [.A1]:[.A2] are legal. I think the latter is not currently accepted by OOo2, but we need it to support some capabilities. > RangeAddress ::= CellAddress ':' CellAddress > > CellReference ::= '[' CellAddress ']' | NamedCellReference > > CellAddress ::= SheetName? '.' ColumnName RowNumber > > If a CellAddress points to a sheet or column or row that got deleted, > the corresponding part of the address is set to '#REF' instead. For > example #REF.A1 was referring a now deleted sheet. > > > ColumnName ::= [a-zA-Z]+ > > Column names are A..Z, AA..ZZ, AAA..ZZZ, ... > OpenFormula _mandates_ uppercase; there's really no need to allow variance. EVERYBODY sends column names in uppercase, and doing this allows detection of problems (like variable names accidentally considered to be cell addresses). > NameIdentifier ::= Identifier - CellAddress - RangeAddress > Too harsh. QTR4 is a CellAddress, and I expect spreadsheets to continue to widen over the years. > TODO: operator precedence > OpenFormula proposed an operator precedence hierarchy. This is easier to do if you consider operators like "!" and ":" to be simply infix operators. > Data Types in Parameters and Return Values > > NumericValue ::= Number > > DateSerial ::= NumericValue > We need these, but is there a need for a special syntactic representation here? I don't know of a reason to do so. --- David A. Wheeler
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]