office-formula message

Subject: Re: [office] Please review syntax of formula work (Michael Brauer)
From: "David A. Wheeler" <dwheeler@dwheeler.com>
To: Michael Brauer - Sun Germany - ham02 - Hamburg <Michael.Brauer@Sun.COM>, office-formula@lists.oasis-open.org
Date: Tue, 22 Aug 2006 12:55:42 -0400
Michael Brauer:
> I took a look at chapter 5 of the recent draft. My feedback is below....
>
> Please note further that although the list below seems to be long, I 
> in general did like what I read.
Great!  Thanks so much for doing that.  Here are my thoughts.
I'd love to hear others' comments, if any, otherwise I'll take the comments
we've received and modify the syntax section.

> - There are multiple instances of "  " (two blank characters). One 
> should be sufficient.
Ok.  That's expected at the end of a sentence, but otherwise that was 
probably
added by a translator, we should fix it.  Of course, if that's the worst 
problem
I'm delighted :-).

>
> - References to external work should make us of the bibliography (see 
> for instance reference to 
> http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-notation.
> in the introduction to chapter 5.
Agree.  I'm using OOo, and I'm not familiar with its bibliography 
mechanisms.
Can anyone give me a pointer to how to best do that "properly"?

> - The OpenDocument specification currently refers to XML 1.0 (Third 
> edition). Is there are reason why OpenFormula refers to XML 1.1?
Only because it was the latest one at the time of writing.
Since we're only referencing the BNF definition, I don't see that
they need to reference the same document. And since I imagine a future
version of OpenDocument will reference XML 1.1, I think there
are reasons to leave this as 1.1 for now.

That said, I haven't done the comparison, but I suspect it doesn't matter.
Is it important that they reference the same version in this case?

(oops, but see below.)

> - The OpenDocument specification uses a fixed font for everything that
> may appear literally in a document and for inline examples. Maybe the 
> formula specification should do the same. I think this may in 
> particular be useful for the symbols of the BNF grammar (a few 
> examples are below).
The BNF _itself_ uses fixed font.
> - The draft contains some explanations that help to understand why
> certain decisions have been made (and that do not have a background), 
> but that are not required to implement the specification. An example 
> is in Section 5.7, the paragraph starting with
>
> "This naming scheme enables different applications to innovate without
> interfering with each other or with standard functions. ...".
>
> My suggestion would be to remove these kind information from the 
> normative specification document, but to have an annotated 
> non-normative version that contains this information.

Okay, we'll do that.

However, this particular capability is really important.  I think we 
should briefly
add a few sentences to the introductory matter listing advantages of 
this specification,
including this capability, so that readers will know WHY they might want 
to choose
this specification.  I think it's important for a specification to briefly
explain what it's good for, so that potential users/implementors will 
know why
they might choose to implement all that detail.


> - Is it an option to move the test cases into a separate chapter? I 
> think this may be advantageous for to reasons. First, I believe most 
> readers of the specification are not interested in the test cases, but 
> in a compact representation of the topic. Second, Implementors 
> probably want to create test suites that cover all test cases, and 
> therefore benefit from having them in one place, too.
Anything is POSSIBLE, but I think we should NOT do that.
It would be easier to move the test cases for the syntax, but then it
would appear inconsistent with the function definitions (which DO place
them together, and I think it's ESPECIALLY important we place them
together for functions).  And I think having the test cases in the same 
place actually
helps readers, even for syntax - BNFs are usually easier to understand 
when you
see examples, and that's just what the test cases provide.  Note that
they're formatted as tables, so it's really easy for a reader to skip them
if they DON'T want to see them.

An additional complication is that we're presenting a broad syntax, but
not all applications will implement every part.  And the plan is to
list section titles so that we can specify what is implemented.
The sub-chapters try to show that.  If there's a better way to do that,
I'd be delighted.

Another possibility would be to exploit the power of OpenDocument itself.
I expect that this spec will be released in OpenDocument format.
We could easily add a variable to allow people to hide the test cases.
The result would not be the normative spec, but if they just want to breeze
through the spec without seeing all the detail, that might be helpful.

> Chapter 5
>
> "When this occurs, various characters (such as "<", ">", '"', and "&") 
> must be escaped, as described in the XML specification. In particular, 
> the less-than symbol "<" is represented as &lt;, the double-quote 
> symbol is represented as &quot;, and the ampersand symbol is 
> represented as &amp;."
>
> Suggestion:
>
> "When this occurs, various characters (such as "<", ">", '"', and "&") 
> must be escaped, as described in section 2.4 of XML specification 
> ([XML1.0]). In particular, the less-than symbol "<" must be 
> represented as &lt; (or as a numeric character reference), the 
> double-quote symbol as &quot;, and the ampersand symbol as &amp;.".

I like it.  Probably should say "or a numeric character reference" at 
the end of
that list, since any of those chars can have numeric character references.

That gives us a reason to reference XML 1.0, for consistency's sake with 
OpenDocument.
Except that once again, I think there's no difference between 1.0 and 1.1.
Should we stick with 1.1, anticipating a switch, or revert to 1.0?
> Section 5.1
>
> "The optional namespace tells ..."
>
> Suggestion:
>
> "An optional namespace prefix tells ..."
>
> In general, I think most occurrences of "Namespace" in this section 
> actually should be "namespace prefix".

Yes, you're right.
> Did the SC consider to remove namespace prefix from the grammar, and 
> to describe it instead in a/the section that describes how formulas 
> are embedded into OpenDocument documents?
To be honest, we've gone back and forth on that, because it's one of those
presentation issues that doesn't make any real difference technically.
What you're seeing in this case is the "last guy with the pen in this 
section won" :-).
I think that was Eike in this case, though maybe I'm to blame :-).

If there's some REASON to do it one way or the other, I'd love to do it
the "better" way.  Can you suggest a rationale for removing it from the 
grammar?

> "When used in OpenDocument attributes table:formula and text:formula, 
> applications should not include this namespace, as it is unnecessary. 
> ..."
>
> I think this conflicts with the current OpenDocument specification, 
> which says that a namespace prefix should be used. My suggestion would 
> be that we state in the ODF 1.2 spec that a formula that has no prefix 
> is an OpenFormula formula, and to state the same in the OpenFormula 
> specification in a non-normative way. I'm not sure if we can do 
> anything for ODF 1.0/1.1.
It does, but it's only a "should" in OpenDocument.  The rationale for doing
it this way is that the OpenDocument spec is expressing a "should" so that
there's room left for a default formula syntax.  Since OpenFormula _is_ that
default, we're simply occupying the room that OpenDocument expressly
reserved for us.

> In general, I think the normative description how formulas are used 
> within OpenDocument should be in the OpenDocument specification itself.
>
> "Namespace_in_XML ::= http://www.w3.org/TR/REC-xml-names";
>
> Does an URI in the BNF used here has a special meaning? If not, what 
> about
Yes.
> "Namespace_in_XML ::= Prefix /* A prefix as declared in section 4 of 
> [xml-names] */
>
> I further would rename "Namespace_in_XML" into "Namespace_prefix"

That'd be fine.  Since this has no effect on what is actually getting 
_spec_'ed,
but merely its presentation, I want the presentation to be whatever will
be clearest to its readers.

> Section 5.2
>
> "simple" -> "simply"
>
> "The primary component of a formula is an expression"
>
> Suggestion:
>
> "The primary component of a formula is an Expression"
>
> This makes clear you are not talking about expressions in general, but 
> about the BNF production Expression. You may also use a fixed font in 
> this case to make it look like the BNF.
Ah, that's where you mean the fixed fonts should be used.
Sure, we could do that.  So let's do that unless someone objects.

There's a risk that people might not
notice when the font changes (esp. those with accessibility needs),
but as long as it's there for emphasis and not critical for understanding
the sentence that'd be helpful.

>
> "SingleQuoted ::= "
>
> Why does "SingleQuoted" appear here? It's not used in this section.

It's reused in several places.  And when you IMPLEMENT an expression,
it turns out that it's important, b/c SingleQuoated can be initiate several
different kinds of expressions.
> "There is no special syntax for the logical constants for truth and 
> falsity, since this is unnecessary;  simply use the ..."
>
> I was first confuses to read this in that section, because "constants" 
> have not been introduced before, but noticed later that "Number" and 
> "String" are of course constants. So my suggestion would be rephrase 
> this similar to
>
> "While the formula syntax defines literal numbers and strings, it does 
> not define literal string constants. Instead, the standard functions 
> TRUE() and FALSE() [add a reference here] can be used."

Excellent!  Yes, let's do that.

> Section 5.3
>
> Is there a better name for "WrittenNumber" and the the term "written" 
> in the description?
I'm sure there is, but I'm at a loss to what one is. Suggestions?
>
> Is there a formal reference to "C" or en-US locale. If not, I suggest
> to remove the reference to locales.
We may as well have a formal reference; it's important for VALUE.
Can anyone suggest the proper reference?

> Section 5.4
>
> "Note that when a formula is stored as an XML attribute (the usual 
> case), XML quoting rules apply: thus double-quote characters are 
> recorded in the XML such as &quot;, and carriage return characters in 
> the String are recorded as &#x0D;.A constant string as defined by this 
> syntax, shall be considered to be type Text."
>
> Suggestion:
>
> "Note that when a formula is stored as an XML attribute, XML escaping 
> rules apply: thus double-quote characters must be escaped as &quot;, 
> and carriage return characters in string constants as &#x0D;. A 
> constant string as defined by this syntax, shall be considered to be 
> type Text."
>
> (It's not essential to change that, but I noticed that the 
> introduction uses the term "escape", and this section the term 
> "quote". Please note the missing space before the last sentence).

Good.

> Section 5.5
>
> "There two predefined" -> "There are two predefined"
>
> "written as &amp;" -> "must be escaped using &amp;"
>
> Sometimes the operator symbols (+, -, etc) are mentioned in the
> description and sometimes this is not the case.
>
> "Also note that while prefix "+" and "-" are right-associative, because
> "+" is a no-operation, applications which implement at most these
> operators, using only the semantics defined by this specification, may
> implement them as left-associative since the results will be identical."
>
> I have to admit that I don't understand that sentence.
We may be trying too hard here.

If you think about it, prefix "+" and "-" should be considered 
right-associative, e.g.,
to compute:
   =  +-[.A1]

You'd first compute -([.A1]), and then compute +(previous_result).

Well, that's all well and good, but in fact, if you compute them 
left-to-right
you'll get the same answer, because of the way prefix + and prefix - work.

So why not just say left-to-right?  Because an application could add OTHER
prefix operations, and if they do, then suddenly the right-to-leftness would
actually matter!

Thoughts?

> "Implementations' user interfaces may display these operators 
> differently or with a different precedence, but when exchanging 
> formulas they shall use the precedence rules here."
>
> Suggestion:
>
> "Implementations' user interfaces may display these operators 
> differently or with a different precedence, but conforming 
> applications must store formulas using the operator names and 
> precedences defined in this specification."
>
> In general, I would not use the term "exchange", because this sounds 
> like OpenFormula being only an exchange format, what it clearly is not.

?? I don't understand.  It _IS_ an exchange format.  At least, we're
NOT requiring anything about the user interface.

> Section 5.6
>
> "Functions are called by giving their named" -> "Functions are called 
> by giving their name"
>
> "LetterXML ::= ", "DigitXML ::= ", "CombiningCharXML ::=":  See my 
> comment on URL's above.
>
> "User-defined function names may use an arbitrary identifier,"
>
> Does that mean that user-defined function names have to follow the 
> production "Identifier"? Or does it mean that they can really use an 
> arbitrary identifier?
> I assume the first is the case.
Yes, you're right, Identifier was intended.

> To make this clearer, I suggest to either format BNF grammar names 
> (for instance by using a fixed font), or to add numbers to the 
> productions and to reference them. The first option probably is much 
> easier to implement, and also easier to maintain.

Yes, agree on all counts.  And an excellent argument for modifying the case.
We should also always use initial-caps, as you've done in your example...
that would make it even more obvious.
> "Applications may (and often do) display a different function name in
> their user interface than ..."
>
> Suggestion:
>
> "Implementations' user interfaces may translate function names,
> may omit application prefixes, or may replace the function names 
> defined in this specification with arbitrary other function names, but 
> conforming applications shall not use these functions names when 
> storing formulas."

Yes.
>
>
> I further suggest to define what the application prefixes are, or to
> reference section 5.7 and to use the terminology from that section.
>
> Section 5.7
>
> Is there a definition somewhere what the "standard" names are?
Well, it's all the functions defined in the spec.  That's a good catch,
we need to make that clear.
> "Applications that do not support a function should compute its result 
> as some Error value other than NA() when calculating its result."
>
> ->
>
> "Applications that do not support a function should compute its result 
> as some Error value other than NA()."

Yes.

> Section 5.8
>
> "in the storage format": Are there different formats? If not, remove this.
There's the (unspecified) display format.
> "Where possible, applications should use references with embedded ":"
> separators, instead of the general-purpose ":" operator, when saving
> files, and where there is a choice of cells to join, and application
> should choose the leftmost pair. "
>
> This sentence becomes only clear then reading the next one. Reference to
> definitions of the two ":" may help.
>
> Are the explanations regarding URIs and relative IRIs/URIs required, 
> or would it be sufficient to state that the could be absolute or 
> relative IRIs?
That WOULD probably be sufficient.  If the reader doesn't know what the
difference is, this isn't the place to explain it.
> Section 5.9
>
> "Automatic lookup of labels can be enabled or disabled in the document 
> settings."
>
> Are these the application specific setting defined in OpenDocument, or 
> something else? A reference may be helpful here.
Yes.  Should we reference particular locations in the OpenDocument spec?
My concern is renumbering.
> Section 5.10
>
> "Applications supporting named expressions must support named 
> expressions that are global to all the sheets in a (spreadsheet) 
> document in the current document (this is a named expression without a 
> Source, QuotedSheetName, or SubtableCell)."
>
> - must is not in our control language any longer.
>
> Would it be sufficient to say:
>
> "Applications supporting named expressions shall support named 
> expressions that are global to all sheets in the current (spreadsheet) 
> document"
That's fine.
>
> Is there a definition of "portable documents"?

Yes, I'm pretty sure there is.  I'll check.
>
> Would it be possible to distribute the descriptions of the various 
> named expression types and the grammar to the subsection?
I did that originally, and people said that it was too confusing.

Maybe this just shows that you can't please everyone :-).


> Section 5.11
>
> "The inline error value NA shall be represented as "#N/A" when 
> represented as an inline error, and this is portable across 
> implementations."
>
> My understanding is that "#N/A" is the only portable error value, so 
> my suggestion is to write:
>
> "The only portable error values is "#N/A", which represents the NA 
> inline error value."

Yes.
>
> The sentence
>
> "Portable documents should not use inline error values other than 
> #N/A, as error values are not necessarily portable between applications."
>
> may be omitted then.

Okay.
>
> Section 5.12
>
> "Applications that support inline arrays must accept"
>
> ->
>
> "Applications that support inline arrays shall accept"
Okay
>
> Section 5.13
>
> "Whitespace (space, tab, newline, and carriage return) is ignored in 
> the default formulas syntax, except inside the contents of string 
> constants and text surrounded by single quotes"
>
> Is there another than the default formula language?
Haha!  Good point.  Okay, strike "default".

> Suggestion
>
> "Whitespace (space, tab, newline, and carriage return) shall be 
> ignored, except it occurs inside the contents of string constants and 
> text surrounded by single quotes"
>
> Would it be an option to extend the grammar with whitespace 
> characters, as it is the case of XML 1.0?
It's an option we tried earlier, but we found it to be a problem.
Eike started it that way, but the grammar gets insanely complicated, and 
quickly.
You spend all your time documenting the stuff that's ignored, and the
resulting grammars are much harder to read.
What's worse, the "obvious" implementations are unworkable, because
the whitespace makes all the grammar rules ambiguous. E.G.:
"there are ten possible rules, they all begin with optional whitespace 
followed by
another token.  I just got whitespace, which rule do I use?"
The "fix" is to rewrite the rules into what we've shown... so why not
write them this way in the first place?

Yes, XML does things differently.  But whitespace handling in XML is
a notorious problem, so I wouldn't hold that up as a virtue.

Almost all language specs do things this way - they tell you that whitespace
is ignored almost everywhere, and then document the exceptions.
Tools like lex let you do that too, so it's friendlier to implementors 
to do it this way.


Thanks SO MUCH for all your comments.  As you can see, for the
most part I think we should just do as you've suggested.
I'd love to hear suggestions for the other parts.

--- David A. Wheeler
References:
- Please review syntax of formula work
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>