office-comment message

Subject: Re: [office-comment] Extending the PERCENTILE() Function
From: Leonard Mada <discoleo@gmx.net>
To: office-comment@lists.oasis-open.org
Date: Sat, 07 Jun 2008 01:53:17 +0300
Hello everyone,

after some thoughts, I came up with extending the percentile function 
even further:

This extension covers some common percentiles / percentile combinations.

1.) PERCENTILE( NumberSequence Data ; Number x )
     - where x is a number between 0 and 1
     - this is the existing function, computing a single percentile

2.) {PERCENTILE( NumberSequence Data ; NumberSequence x ) }
     - where x is a sequence of numbers,
        every number in the sequence being between 0 and 1
     - function computes all the percentiles defined by x
     - returns an array

3.) {PERCENTILE( NumberSequence Data ; String x ) }
where x can be one of:
    - "q" or "quartile": computes: minimum, 1st quartile, median, 3rd 
quartile, maximum
    - "p" or "percentile": computes all 101 percentiles
    - "IPR95": computes the 0.025, 0.50 (median) and 0.975 percentiles

Hopefully I get the necessary support for these changes.

Sincerely,

Leonard


Leonard Mada wrote:
> Dear list members,
>
> I propose the following extension to the percentile function:
>
> Syntax: PERCENTILE( NumberSequence Data ; Number x )
> Syntax: PERCENTILE( NumberSequence Data ; NumberSequence x )
>
> where x:
>    is a number between 0 and 1, or
>    a sequence of numbers, all between 0 and 1.
>
> RATIONALE
> ==========
> 1.) a percentile is seldom useful alone
>
> 2.) percentiles take long to compute (see below)
>  -- current spreadsheet implementations are likely to use
>      a sorting algorithm, so, whenever a different percentile
>      is computed, the program needs to sort the initial array again
>  -- implementing a wise caching mechanism is useful, BUT
>      puts the burden on the implementors and might not always
>      work well
>  -- implementing a fast-algorithm might be useful
>     [as I mentioned in a previous post, see section B of:
>      
> http://lists.oasis-open.org/archives/office-comment/200706/msg00012.html]
>     but somehow the TC did not show any favourable opinion to define
>     a fast set of functions [users would usually know when to use such 
> a set]
>
> Is this really relevant?
> I drafted recently a spreadsheet [1] to test the computational limits 
> of OOo (and spreadsheets in general), so everyone can judge for 
> himself. The spreadsheet might look evil (I did develop it with the 
> limitations of spreadsheets in mind), but please consider that, as 
> modern analysis approaches shift to robust statistical methods, such 
> analysis becomes more and more computer intensive. Such methods 
> usually involve some resampling and a lot of sorting (as used in naive 
> algorithms).
>
> [1] http://www.openoffice.org/issues/show_bug.cgi?id=89976
>
> Because computing all the percentiles at once would need only one sort 
> operation, it makes sense to extend the PERCENTILE() function in this 
> way.
>
> Sincerely,
>
> Leonard
>
> P.S. Opening the first spreadsheet will take a huge amount of time. I 
> recommend opening the 2nd one and filling the column up to 30,000 rows 
> and judge then IF one wants to proceed further down (the spreadsheet 
> will likely behave worse than O(n^2)).
>
References:
- Extending the PERCENTILE() Function
  - From: Leonard Mada <discoleo@gmx.net>