[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

*Subject*: **Re: [office-comment] Extending the PERCENTILE() Function**

*From*:**Leonard Mada <discoleo@gmx.net>***To*: office-comment@lists.oasis-open.org*Date*: Sat, 07 Jun 2008 01:53:17 +0300

Hello everyone, after some thoughts, I came up with extending the percentile function even further: This extension covers some common percentiles / percentile combinations. 1.) PERCENTILE( NumberSequence Data ; Number x ) - where x is a number between 0 and 1 - this is the existing function, computing a single percentile 2.) {PERCENTILE( NumberSequence Data ; NumberSequence x ) } - where x is a sequence of numbers, every number in the sequence being between 0 and 1 - function computes all the percentiles defined by x - returns an array 3.) {PERCENTILE( NumberSequence Data ; String x ) } where x can be one of: - "q" or "quartile": computes: minimum, 1st quartile, median, 3rd quartile, maximum - "p" or "percentile": computes all 101 percentiles - "IPR95": computes the 0.025, 0.50 (median) and 0.975 percentiles Hopefully I get the necessary support for these changes. Sincerely, Leonard Leonard Mada wrote: > Dear list members, > > I propose the following extension to the percentile function: > > Syntax: PERCENTILE( NumberSequence Data ; Number x ) > Syntax: PERCENTILE( NumberSequence Data ; NumberSequence x ) > > where x: > is a number between 0 and 1, or > a sequence of numbers, all between 0 and 1. > > RATIONALE > ========== > 1.) a percentile is seldom useful alone > > 2.) percentiles take long to compute (see below) > -- current spreadsheet implementations are likely to use > a sorting algorithm, so, whenever a different percentile > is computed, the program needs to sort the initial array again > -- implementing a wise caching mechanism is useful, BUT > puts the burden on the implementors and might not always > work well > -- implementing a fast-algorithm might be useful > [as I mentioned in a previous post, see section B of: > > http://lists.oasis-open.org/archives/office-comment/200706/msg00012.html] > but somehow the TC did not show any favourable opinion to define > a fast set of functions [users would usually know when to use such > a set] > > Is this really relevant? > I drafted recently a spreadsheet [1] to test the computational limits > of OOo (and spreadsheets in general), so everyone can judge for > himself. The spreadsheet might look evil (I did develop it with the > limitations of spreadsheets in mind), but please consider that, as > modern analysis approaches shift to robust statistical methods, such > analysis becomes more and more computer intensive. Such methods > usually involve some resampling and a lot of sorting (as used in naive > algorithms). > > [1] http://www.openoffice.org/issues/show_bug.cgi?id=89976 > > Because computing all the percentiles at once would need only one sort > operation, it makes sense to extend the PERCENTILE() function in this > way. > > Sincerely, > > Leonard > > P.S. Opening the first spreadsheet will take a huge amount of time. I > recommend opening the 2nd one and filling the column up to 30,000 rows > and judge then IF one wants to proceed further down (the spreadsheet > will likely behave worse than O(n^2)). >

**References**:**Extending the PERCENTILE() Function***From:*Leonard Mada <discoleo@gmx.net>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]