office-comment message

Subject: Extending the PERCENTILE() Function

From: Leonard Mada <discoleo@gmx.net>
To: office-comment@lists.oasis-open.org
Date: Tue, 03 Jun 2008 23:49:44 +0300

Dear list members,

I propose the following extension to the percentile function:

Syntax: PERCENTILE( NumberSequence Data ; Number x )
Syntax: PERCENTILE( NumberSequence Data ; NumberSequence x )

where x:
    is a number between 0 and 1, or
    a sequence of numbers, all between 0 and 1.

RATIONALE
==========
1.) a percentile is seldom useful alone

2.) percentiles take long to compute (see below)
  -- current spreadsheet implementations are likely to use
      a sorting algorithm, so, whenever a different percentile
      is computed, the program needs to sort the initial array again
  -- implementing a wise caching mechanism is useful, BUT
      puts the burden on the implementors and might not always
      work well
  -- implementing a fast-algorithm might be useful
     [as I mentioned in a previous post, see section B of:
      
http://lists.oasis-open.org/archives/office-comment/200706/msg00012.html]
     but somehow the TC did not show any favourable opinion to define
     a fast set of functions [users would usually know when to use such 
a set]

Is this really relevant?
I drafted recently a spreadsheet [1] to test the computational limits of 
OOo (and spreadsheets in general), so everyone can judge for himself. 
The spreadsheet might look evil (I did develop it with the limitations 
of spreadsheets in mind), but please consider that, as modern analysis 
approaches shift to robust statistical methods, such analysis becomes 
more and more computer intensive. Such methods usually involve some 
resampling and a lot of sorting (as used in naive algorithms).

[1] http://www.openoffice.org/issues/show_bug.cgi?id=89976

Because computing all the percentiles at once would need only one sort 
operation, it makes sense to extend the PERCENTILE() function in this way.

Sincerely,

Leonard

P.S. Opening the first spreadsheet will take a huge amount of time. I 
recommend opening the 2nd one and filling the column up to 30,000 rows 
and judge then IF one wants to proceed further down (the spreadsheet 
will likely behave worse than O(n^2)).

Follow-Ups:
- Re: [office-comment] Extending the PERCENTILE() Function
  - From: Leonard Mada <discoleo@gmx.net>
- Re: [office-comment] Extending the PERCENTILE() Function
  - From: robert_weir@us.ibm.com