office-formula message

Subject: Re: [office-formula] Proposal: Drop "huge" group; any criticallymissing functions for "large"?
From: "David A. Wheeler" <dwheeler@dwheeler.com>
To: office-formula@lists.oasis-open.org
Date: Thu, 27 Jul 2006 11:32:24 -0700 (PDT)
Andreas J Guelzow said:
> I do find it disturbing that Excel and OpenOffice.org apparently
> play a differnet role than the otehr spreadsheets.

Ah, I understand why you'd say that, because of the point at
which you joined.  If you'd joined a few months earlier, you'd have
asked why SheetToGo and wikiCalc play a different role.  When we
we work on "huge", you'll ask why Gnumeric and Quattro Pro
play a different role :-).

It's actually not that way; this is a side-effect of
some earlier decisions.  They can be revisited, but I think
it'd be useful to explain why we are where we are.  And it'd
be a good idea to document what's going on here
in the mailing list.

So, here's my try, at least from my perspective.

When we formed the group, had the kickoff teleconference, etc.,
it was noted by Dan Bricklin (and agreed to by all)
that different applications target different markets/customers.
WikiCalc is not Gnumeric; they have different target users.
Thus, "one size fits all" requirements for functions, capabilities,
etc., was unacceptable for a truly universal spec.
If you're writing the specification for the interface of
a single application (e.g., Excel), it's fine to have a
simple list of you "must implement exactly this". But that's
unacceptable if you want a REAL open standard for interoperability.
Indeed, look at all the discussion on namespaces, so we can be
sure that different applications can save their data in a common
format; we're serious about getting many different actors on board.

So we agreed that it'd be possible for applications to
arbitrarily subset and superset the spec.  That's very
flexible, but it leads to the next problem... how can I
create spreadsheet documents and know that other applications
can use them?  If every application implements nonoverlapping
sets, in the worst case nothing can interoperate.  Not okay.
The proposed solution is to document _groups_ (formerly levels)
that applications can assert that they conform to, and
that portable documents can say they need.
Applications don't have to implement a group, but many will
(if we provide reasonable groups to choose from).
There is no perfect way to determine a group; the best
way I know of is to look at existing implementations, and
appeal to this group to make tweaks.

The current main groups are small, medium, large, huge; the names
are based on the number of functions they include, and
are intentionally designed to separate different markets:
* Small is intended for PDA-sized devices and/or those who want
  less development effort, yet still provide "all the common
  functions" (whatever THAT means). We have an exemplar: SheetToGo
  on Palm PDAs.  The wikiCalc developers used that list and
  re-implemented that set of functions, demonstrating that it IS
  a reasonably implementable set without a massive time investment.
  About 100 functions.
* Medium is an intermediate step between Small and Large, based
  on "what most applications implement" - so this takes
  ALL applications into account.  This is a transitional group,
  for those applications moving from small towards large.
  This has around 200 functions.
* Large is based on the typical spreadsheet implementations that
  are included in desktop office suites, such as OpenOffice.org's Calc,
  Microsoft Excel, and what I understand to be
  the current goal of KOffice KSpread.  Around 300 functions.
* Huge is based on the spreadsheet implementations that
  are designed to provide the best possible spreadsheet formula
  capabilities, including the support of many useful but
  highly specialized functions.  These applications
  are often developed (at least originally) as stand-alone
  programs, not as part of an office suite (though they typically
  join one later).  Gnumeric and Quattro Pro fit in this class;
  both have over 400 functions (Gnumeric has the most).

Thus, when examining the functions for "large", we
look especially at Excel and OpenOffice.org, because those are
well-known examples of that market niche.  It's not because
they are "better" than all other apps.  In particular, we want to
make sure that existing users of Excel and OpenOffice.org
can transition _to_ OpenFormula without losing use of
their document files.... and in fact, we want them to be able
to exchange files right away with other implementations.
The issue isn't really "what do applications do", it's
"what do people's spreadsheet documents depend on"?
If many spreadsheet documents depend on something, we need to
give them a way to exchange that information between applications.

Rob Weir has correctly pointed out that using existing
applications, or this body of people, to group functions is
imperfect.  True.  But there's no higher authority we can appeal to,
and producing a spec where formulas do NOT interoperate is not okay.
So we'll use our collective best judgement, and I think the
results will be quite good.

And here's the problem: A lot people really need, at most,
the "large" set.  So I propose that we work in stages -
let's get everything through the "large" set done, release
the spec, and then work on what's needed for the "huge" set.
The "huge" set includes the "large" set, so the current work
(even omitting "huge") is still 100% relevant for Gnumeric developers
and Gnumeric users.  In particular, I expect that many
spreadsheet documents created/modified by Gnumeric would
be completely covered by the "large" set.

> I would rather see many of the "nonsense" functions dropped
> that are apparently implemented in all spreadsheets than
> useful reasonable functions that currently are only
> implemented in some.

I like the way you think.  And in some cases that
may be what we should do. In others, maybe that's a
poor approach.  After, we need to make sure
use have a transition approach; having a great train
is only useful if people can get to the station.
So let's discuss the best approach.

What I hope we're doing is leading to the future, while
reaching back and helping people get on board as necessary.
Per an earlier message, perhaps the right way is to have
a LEGACY group, or simply define BIN2DEC (etc.) without
requiring them in any particular group.
Sometimes, maybe we don't need to define these old
legacy nasties at all; that's particularly true if
they are essentially unused.  Our point is to faithfully exchange
spreadsheet DOCUMENTS, not to reimplement any particular app.

Together, let's find a way to look behind AND ahead.

> For example, is BIN2DEC really more important than
> BITAND or SSMEDIAN. (SSMEDIAN calculates a standard median used
> in the social sciences for discrete data with repetition.)
...
> GNumeric has two functions called
> BINOMDIST(x,n,p,FALSE) and R.DBINOM(x,n,p,FALSE)
> that both supposedly do the same thing...
>From a mathematical point of view BINOMDIST has serious issues,
>for example:
> BINOMDIST(0.5,10,0.2,FALSE) is 0.107 rather than 0
> BINOMDIST(11,10,0.2,FALSE) is an Error rather than 0

Thanks!  Perfect!

That was the point of my request.  I don't think that the
"large" set should include all of Gnumeric's functions, some of
which are very specialized.  But a few of those functions
(at least their semantics) probably _do_ belong in a general-purpose
office suite spreadsheet application.  I had noted BITAND
specifically in my message, as you can see.  The semantic
issues that you raise are absolutely critical, too. So let's
identify the functions and semantic issues, and address them.
That'll help everyone interchange spreadsheet documents.

--- David A. Wheeler
Follow-Ups:
- Re: [office-formula] Proposal: Drop "huge" group; anycriticallymissing functions for "large"?
  - From: "Andreas J. Guelzow" <aguelzow@math.concordia.ab.ca>
- Re: [office-formula] Proposal: Drop "huge" group; anycriticallymissing functions for "large"?
  - From: "Andreas J. Guelzow" <aguelzow@math.concordia.ab.ca>