[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Extending the PERCENTILE() Function
Dear list members, I propose the following extension to the percentile function: Syntax: PERCENTILE( NumberSequence Data ; Number x ) Syntax: PERCENTILE( NumberSequence Data ; NumberSequence x ) where x: is a number between 0 and 1, or a sequence of numbers, all between 0 and 1. RATIONALE ========== 1.) a percentile is seldom useful alone 2.) percentiles take long to compute (see below) -- current spreadsheet implementations are likely to use a sorting algorithm, so, whenever a different percentile is computed, the program needs to sort the initial array again -- implementing a wise caching mechanism is useful, BUT puts the burden on the implementors and might not always work well -- implementing a fast-algorithm might be useful [as I mentioned in a previous post, see section B of: http://lists.oasis-open.org/archives/office-comment/200706/msg00012.html] but somehow the TC did not show any favourable opinion to define a fast set of functions [users would usually know when to use such a set] Is this really relevant? I drafted recently a spreadsheet  to test the computational limits of OOo (and spreadsheets in general), so everyone can judge for himself. The spreadsheet might look evil (I did develop it with the limitations of spreadsheets in mind), but please consider that, as modern analysis approaches shift to robust statistical methods, such analysis becomes more and more computer intensive. Such methods usually involve some resampling and a lot of sorting (as used in naive algorithms).  http://www.openoffice.org/issues/show_bug.cgi?id=89976 Because computing all the percentiles at once would need only one sort operation, it makes sense to extend the PERCENTILE() function in this way. Sincerely, Leonard P.S. Opening the first spreadsheet will take a huge amount of time. I recommend opening the 2nd one and filling the column up to 30,000 rows and judge then IF one wants to proceed further down (the spreadsheet will likely behave worse than O(n^2)).