office-formula message

Subject: Re: [office-formula] Grouping functions and capabilities
From: "David A. Wheeler" <dwheeler@dwheeler.com>
To: robert_weir@us.ibm.com, office-formula@lists.oasis-open.org
Date: Wed, 12 Jul 2006 14:39:39 -0400 (EDT)
I said:
> > We still need leveling "inside" a single function, because some
> > functions have
> > both a minimal capability (e.g., with limited domain or lack support
> > of certain options),
> > vs. a larger capability, and I'd expect that to be captured here too.

Robert Weir replied:
> Thinking outside of the spec for a second -- I'm wondering if there is a 
> better way to solve this problem?  Why do we have implementations that 
> only support a subset of a functions capabilities?  Lack of code?  If so, 
> let's get the full implementations of all of the code available as open 
> source under a licence that allows everyone to use it.  Solve the problem 
> that way. 
> I'm thinking if we have only a small number of functions like that, 
> getting the code available may solve the problem.  If we have a large 
> number of functions like that, then we have a bigger problem.
> 
> I'd rather not specify a mess just because the mess exists.  I'd rather 
> specify the way things should be and then make it easy for implementations 
> to catch up.

That's a fair response.  Certainly it'd be simpler for all users if there was
no such distinction, and it would make the spec simpler too.
I would LOVE for there to NOT be such a distinction.  Going into this,
I couldn't presume we could do it, and allowing it made it
easier to get started (it was easy to document what did NOT work).

I don't know the answer in general for all functions, but I looked at the
current draft spec for the "level 1" functions.  In theory, these should be the
ones that are most similar (have the fewest differences like that), so this may
be a little misleading, but let's start there.

The details are below, but I think there's a good case for dropping
specialized leveling inside a single function, as you've recommended.
In some cases we can just leave
things unspecified. E.g., AND and OR with one parameter...
who cares?  In some areas it's an obvious extension, e.g., OpenOffice.org's
LOG function _requires_ the second parameter, but for others it's
optional (default 10)... we could simply require it.  In some areas
it's pointing out bugs.. we can require it, or note it as undefined, as appropriate.
We could leave something unspecified but with a should/recommended
statement... that might encourage implementations to converge in the future.
I'm not sure about the COUNTIF/SUMIF example, need to investigate that.

One issue is DATE and TIME... some apps return an error
if some params are too "large" or "small", but some users (esp. of Excel)
use them to find the "next day" and "previous" day by exploiting
rollover.  I think we should spec that apps NOT return an error,
but instead return the rolled-over day.. it's useful to roll over to the "next day",
and if you really want to see if a day is "illegal" there are other ways to do it.

Obviously this implies more changes to the apps to meet the spec, though
if this is an indication, it's not too bad.

So shall we try to specify functions without grouping inside them?
I think we can try, especially if we document in notes why we leave some
things unspecified where we do.  Does anyone object to that?
We can still have groups, but for now
a function wouldn't get credit until it completely
implemented all the required functionality.  (We can undo this later if
that turns out to be unworkable.)

So, everyone -look at the list below, and the draft spec, and post what
things you think should be specified, and what should be unspecified.
Otherwise, I'll soon make a guess & put my guess into the next draft, about
which we can then kibitz.  The draft will include a discussion about this in
each function, so we won't lose knowledge by doing that.

Details below.

--- David A. Wheeler

=== My analysis of the differences ===

These aren't the only differences between applications, obviously.
We have differences on "is there a distinct logical type" and text->number
auto-conversions, which we have discussed at length.  Applications differ
on whether or not the string "TRUE" is true as well, and a few other nitnoids like that.
The database comparison expression languages differ (sigh).
Some differences are influenced by other OpenDocument settings
(e.g., if comparisons are case-sensitive) - for those, just use the right settings
(no issue).

But looking at functions to see where they have different levels
of functionality inside the SAME function, for at least the old "level 1"
functions they ARE a small list:
* DATE, TIME: Different implementations handle rollovers and fractions differently.
   We could certainly define a particular set of requirements.
* COUNT, COUNTA: How should errors, strings, logicals be handled?  No param?
* COUNTIF/SUMIF: an oddity involving typing; need to investigate.
* VALUE: Some date formats, what do you do with non-text
* CHOOSE/ISERROR: OOo doesn't capture errors with ISERROR correctly; this
   could easily be considered a mistake in OOo.
* HLOOKUP/VLOOKUP: Excel 2003 returns garbage on not-found.. an error
   should really be the result instead. Another Excel bug, methinks.
* AND/OR: What if given only 1 param?  Granted, that's pretty useless, we could
   easily get away ignoring this
* IF: empty parameter for else, converting condition
* LOG: OOo _requires_ 2 params, while most everyone else it's optional (Excel, etc.)
* MIN/MAX if range has no numbers.  Excel & OOo don't support, Gnumeric does.
* MOD: KOffice used to differ, but I think now everyone gets the same answers
* ROUND: How do you handle ROUND(-1.5)?
* CHAR: Can CHAR(10) be used for newline?
* EXACT: When given non-text
* LEFT/RIGHT: If asked for 0 or 2^32 chars
* REPLACE/SUBSTITUTE: If asked to replace 0 chars

ACOS/COS has a difference, but I think that's
really a number conversion issue, not ACOS. Some of the "differences"
like PRODUCT are really conversion issues.

We originally used higher levels to spec higher accuracy requirements (e.g., PI()).
That could be handled differently, and probably should.  Or just don't worry
about it at this stage... basically everyone uses doubles or a perfect-number
library anyway, so it's not a distinguishing characteristic (nor is it impacting
interoperability).
Follow-Ups:
- Re: [office-formula] Grouping functions and capabilities
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>
References:
- Re: [office-formula] Grouping functions and capabilities
  - From: "David A. Wheeler" <dwheeler@dwheeler.com>
- Re: [office-formula] Grouping functions and capabilities
  - From: robert_weir@us.ibm.com