OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

office-formula message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: RE: [office-formula] Re: (fwd) Should OpenFormula BASE() and DECIMAL() definitionslist character set?


Assuming this was intended for the list as well...
___________________________

"Dennis E. Hamilton" <dennis.hamilton@acm.org> wrote on 05/06/2009 
05:57:16 PM:

> 
> RE: [office-formula] Re: (fwd) Should OpenFormula BASE() and DECIMAL
> () definitions list character set?
> 
> Yes, I meant that encoding parameter.
> 
>  - Dennis
> 
> Here's more thinking about that.  I am doing this off the top of my head
> without digging out a recent draft of OpenFormula, so I apologize if I 
am
> covering well-worn ground:
> 
> 1. It seems reasonable to me that OpenFormula be expressed in terms of
> Unicode (not an encoding), which means it also doesn't have anything to 
say
> about XML character entities or anything like that.
> 

Yes.


> I don't know enough about the OpenFormula specification to know how any
> character-escaping (if any) is handled.  But that could be kept at the
> Unicode level, without reference to an encoding.
> 

There are several levels of "encoding":

1) Unicode level mapping of Unicode strings into bytes.

2) XML level format, including the use of numerical character entities, 
etc.

3) Any OpenFormula level level escaping.  For example, I think we have an 
UPPER() function to turn each character in a string to capital case. Well, 
if strings parameters are delimited by quotation marks, how do we escape a 
quote literal in the string?

I'm assuming we define OpenFormula at that 3rd level only, at least within 
the OpenFormula part.  But in the main part we can say table:formula is an 
xsd:string, with a value that is a conforming OpenFormula expression. 
Calling it xsd:string in an XML file triggers the other constraints at 
levels 1 and 2 in the above model. 

> 2. One consequence of taking this approach is that one has to be 
cognizant
> of the fact that the formulas may be carried as attribute values in
> XML-based implementations and they will (1) have to be represented in 
the
> character-set encoding of that representation and (2) appropriately
> contained in quotations where the extent of the formula is unambiguously
> determined.  How that's done would seem to fall entirely on the
> implementation that carries the formulas, not OpenFormula itself.  ODF 
1.2
> might need to say something about it, in its definition for table-cell
> formula, but probably not if XML attribute-value representation rules 
and
> the use of attribute value-type string are sufficient.
> 

As above.  I think saying it is an attribute value of type xsd:string 
should be sufficient.  An informative example might be useful as an 
illustration.

> 3. I think it would be useful to know, in a pure-Unicode approach, what
> minimum set of Unicode characters are required to be usable to express
> OpenFormula and what others might be allowed (e.g., in the rules for 
names
> and the expression of string-valued literals used within a formula).  A 
BNF
> grammar that appealed to Unicode character categories could set the
> ceilings, but it is useful to know if there is a different floor.
> 

Function that take a string parameter may be unconstrained other than the 
constraints on them being in allowable in attribute values. It might be 
worth a quick look at XPath, which also has a simple expression language 
with a formula library represented in XML attribute values, to see how 
they handle this.  We can probably copy their approach.

-Rob


> -----Original Message-----
> From: robert_weir@us.ibm.com [mailto:robert_weir@us.ibm.com] 
> Sent: Wednesday, May 06, 2009 14:24
> To: dennis.hamilton@acm.org
> Cc: 'David A. Wheeler'; 'Eike Rathke'; 'Michael Brauer'; 'OASIS ODFF 
SC';
> patrick@durusau.net
> Subject: RE: [office-formula] Re: (fwd) Should OpenFormula BASE() and
> DECIMAL() definitions list character set?
> 
> I'm assuming you mean the character encoding expressed at the XML level, 

> such as:
> 
> <?xml version="1.0" encoding="ISO-8859-1"?> 
> 
> Do we need to say anything more here than what the XML Recommendation 
> says?  What else would you have?  Restrict encoding to an enumerated 
> subset of encodings? 
> 
> In any case, I think things are cleaner if we keep OpenFormula 
independent 
> of XML, since there is not necessary connection to markup at all, if we 
> partition this right. 
> 
> OpenFormula should define conformance for two things:
> 
> 1) The syntax of a conforming OpenFormula expression, which would be 
> expressed as BNF and other constraints on a Unicode string
> 
> 2) Constraints on the values returned by a conforming OpenFormula 
> expression 
> 
> -Rob
> 
> 
> "Dennis E. Hamilton" <dennis.hamilton@acm.org> wrote on 05/06/2009 
> 03:38:32 PM:
> 
> > 
> > Subject:
> > 
> > RE: [office-formula] Re: (fwd) Should OpenFormula BASE() and DECIMAL
> > () definitions list character set?
> > 
> > Does there need to be something said about what happens when the 
> content.xml
> > (or any other XML file) uses an encoding other than one for Unicode? 
My
> > understanding is that other encodings are referenced to Unicode in and 

> out,
> > but not sure whether this has been clarified anywhere in ODF 
> specifications
> > nor in OpenFormula, where the ability to correctly interpret and to 
> preserve
> > on export may be impaired (though hopefully not for any of the 
printable
> > Basic Latin characters of Unicode). 
> > 
> >  - Dennis 
> > 
> > -----Original Message-----
> > From: robert_weir@us.ibm.com [mailto:robert_weir@us.ibm.com] 
> > Sent: Wednesday, May 06, 2009 11:34
> > To: David A. Wheeler; Eike Rathke; Michael Brauer; OASIS ODFF SC;
> > patrick@durusau.net
> > Subject: [office-formula] Re: (fwd) Should OpenFormula BASE() and 
> DECIMAL()
> > definitions list character set?
> > 
> > I checked with our local Unicode guru, to make sure we were expressing 

> > this right.  He confirmed that it is correct to refer to the "value 
> space" 
> > as Unicode "characters" and "strings", and the serialized versions as 
> > "encoded characters".
> > 
> > [ ... ]
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail.  Follow this link to all your TCs in OASIS at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
> 



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]