OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

# office-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Extending the PERCENTILE() Function

• To: office-comment@lists.oasis-open.org
• Date: Tue, 03 Jun 2008 23:49:44 +0300

Dear list members,

I propose the following extension to the percentile function:

Syntax: PERCENTILE( NumberSequence Data ; Number x )
Syntax: PERCENTILE( NumberSequence Data ; NumberSequence x )

where x:
is a number between 0 and 1, or
a sequence of numbers, all between 0 and 1.

RATIONALE
==========
1.) a percentile is seldom useful alone

2.) percentiles take long to compute (see below)
-- current spreadsheet implementations are likely to use
a sorting algorithm, so, whenever a different percentile
is computed, the program needs to sort the initial array again
-- implementing a wise caching mechanism is useful, BUT
puts the burden on the implementors and might not always
work well
-- implementing a fast-algorithm might be useful
[as I mentioned in a previous post, see section B of:

http://lists.oasis-open.org/archives/office-comment/200706/msg00012.html]
but somehow the TC did not show any favourable opinion to define
a fast set of functions [users would usually know when to use such
a set]

Is this really relevant?
I drafted recently a spreadsheet [1] to test the computational limits of
OOo (and spreadsheets in general), so everyone can judge for himself.
The spreadsheet might look evil (I did develop it with the limitations
approaches shift to robust statistical methods, such analysis becomes
more and more computer intensive. Such methods usually involve some
resampling and a lot of sorting (as used in naive algorithms).

[1] http://www.openoffice.org/issues/show_bug.cgi?id=89976

Because computing all the percentiles at once would need only one sort
operation, it makes sense to extend the PERCENTILE() function in this way.

Sincerely,

Leonard

P.S. Opening the first spreadsheet will take a huge amount of time. I
recommend opening the 2nd one and filling the column up to 30,000 rows
and judge then IF one wants to proceed further down (the spreadsheet
will likely behave worse than O(n^2)).

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]