Subject: RE: [office-formula] Re: (fwd) Should OpenFormula BASE() and DECIMAL() definitions list character set?

Assuming this was intended for the list as well...

"Dennis E. Hamilton" <dennis.hamilton@acm.org> wrote on 05/06/2009 
05:57:16 PM:

> RE: [office-formula] Re: (fwd) Should OpenFormula BASE() and DECIMAL
> () definitions list character set?
> Yes, I meant that encoding parameter.
>  - Dennis
> Here's more thinking about that.  I am doing this off the top of my head
> without digging out a recent draft of OpenFormula, so I apologize if I 
> covering well-worn ground:
> 1. It seems reasonable to me that OpenFormula be expressed in terms of
> Unicode (not an encoding), which means it also doesn't have anything to 
> about XML character entities or anything like that.


> I don't know enough about the OpenFormula specification to know how any
> character-escaping (if any) is handled.  But that could be kept at the
> Unicode level, without reference to an encoding.

There are several levels of "encoding":

1) Unicode level mapping of Unicode strings into bytes.

2) XML level format, including the use of numerical character entities, 

3) Any OpenFormula level level escaping.  For example, I think we have an 
UPPER() function to turn each character in a string to capital case. Well, 
if strings parameters are delimited by quotation marks, how do we escape a 
quote literal in the string?

I'm assuming we define OpenFormula at that 3rd level only, at least within 
the OpenFormula part.  But in the main part we can say table:formula is an 
xsd:string, with a value that is a conforming OpenFormula expression. 
Calling it xsd:string in an XML file triggers the other constraints at 
levels 1 and 2 in the above model. 

> 2. One consequence of taking this approach is that one has to be 
> of the fact that the formulas may be carried as attribute values in
> XML-based implementations and they will (1) have to be represented in 
> character-set encoding of that representation and (2) appropriately
> contained in quotations where the extent of the formula is unambiguously
> determined.  How that's done would seem to fall entirely on the
> implementation that carries the formulas, not OpenFormula itself.  ODF 
> might need to say something about it, in its definition for table-cell
> formula, but probably not if XML attribute-value representation rules 
> the use of attribute value-type string are sufficient.

As above.  I think saying it is an attribute value of type xsd:string 
should be sufficient.  An informative example might be useful as an 

> 3. I think it would be useful to know, in a pure-Unicode approach, what
> minimum set of Unicode characters are required to be usable to express
> OpenFormula and what others might be allowed (e.g., in the rules for 
> and the expression of string-valued literals used within a formula).  A 
> grammar that appealed to Unicode character categories could set the
> ceilings, but it is useful to know if there is a different floor.

Function that take a string parameter may be unconstrained other than the 
constraints on them being in allowable in attribute values. It might be 
worth a quick look at XPath, which also has a simple expression language 
with a formula library represented in XML attribute values, to see how 
they handle this.  We can probably copy their approach.


> In any case, I think things are cleaner if we keep OpenFormula 
> of XML, since there is not necessary connection to markup at all, if we 
> partition this right. 
> OpenFormula should define conformance for two things:
> 1) The syntax of a conforming OpenFormula expression, which would be 
> expressed as BNF and other constraints on a Unicode string
> 2) Constraints on the values returned by a conforming OpenFormula 
> expression 
> -Rob
