[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-comment] Extending the PERCENTILE() Function
robert_weir@us.ibm.com wrote: > Leonard Mada <discoleo@gmx.net> wrote on 06/03/2008 04:49:44 PM: > > >> Dear list members, >> >> I propose the following extension to the percentile function: >> >> Syntax: PERCENTILE( NumberSequence Data ; Number x ) >> Syntax: PERCENTILE( NumberSequence Data ; NumberSequence x ) >> >> where x: >> is a number between 0 and 1, or >> a sequence of numbers, all between 0 and 1. >> >> > > Hi Leonard, just so I'm clear, the extension is the 2nd form of > PERCENTILE you list, right? It would presumably be a vector function. > Yes, it is the 2nd and it is a vector (array) function. Forgot to say that. [well, it was quite late in the evening, 03 Jun 2008 23:49:44 +0300, my apologies for the omission.] > An unrelated question for you, since you've obviously spent some time > thinking of spreadsheet performance -- Now that we see increased numbers > of multi-core processors on desktop machines, what do you estimate would > be the best way of taking advantage of this? > The major problem of existing spreadsheets is that they allow spaghetti code: one cell references another referencing another ..., so you have complex dependencies (sometimes even circular references, I'll come during a later e-mail back to this), which are not easy to parallelize. At least, the burden on the programmer will be quite high. > It seems that in some cases you'll have a spreadsheet with 1000's of > formulas, but none of them especially slower than the others. In that > case, a parallelizable calculation chain will split the load into two > threads. But in some cases the calculation time might be dominated by > just one or two cells. In that case, a parallel implementation of a > particular function would give the greatest benefit. > In those cases, this is true. But a generic solution is not easy, because users might combine references in the weirdest ways, and penalties may arise from numerous unnecessary recalculations, not necessary from the slowest function. In my example, it is probably the 6.6 million unnecessary sort operations or 101 sort operations (depending on the algorithm used), where one single sort would have sufficed. Some other factors are probably at work, too. The advantage of one function instead of dozens - doing, when combined, the same job - is probably less obvious. I will explain: - IF a cell gets modified, it will broadcast a message to cells that are dependent on it (that are referencing it), to recalculate - lets move now to such a cell: -- a good implementation will now look IF this cell is dependent on other cells that have yet to be recalculated -- IF there are such cells, it will wait for those cells to be first recalculated, and only then it will recalculate - using a compound function instead of multiple sub-functions has a clear advantage: -- it will wait for all calculations that it depends on, to be done -- and updates only afterwards - with multiple sub-functions, some will recalculate, but later on, they will need to recalculate again [we can't predict IF cells the function depends on, will be modified in the future, by one of the other recalculations] With a parallel design you will most likely end up with cells waiting for other cells to be recalculated, and/or various cells get recalculated multiple times (instead of once). Though, some gain can be obtained. But for a consistent gain, it is way better to address some spreadsheet design issues first. [Just as a side note: the MEDIAN and PERCENTILE belong to the group of slower functions. LOOKUP and OFFSET functions might belong to this group, too. However, it is probably not a single function that causes the trouble, but the update that triggers recalculating various cells dozens-thousands-millions of time.] > In other words, I wonder whether there is a significant boost in > calculation performance we can get by just using the current generation of > hardware better. > One hardware comes to mind that would perform significantly better. Vector processors have a significant potential, but to my knowledge the chances to get one into a desktop system are fairly dim. Again, a structured spreadsheet is likely to compute faster than a free-style generated one. Sincerely, Leonard > (and I wonder if there are some spreadsheet functions that can be > off-loaded to a graphics card...) > Calculations done by the graphic card are - as far as I know - inaccurate (definitely less accurate than needed in spreadsheets). If you need accurate calculations, than you end up with a hyper-expensive intended-for-CAD graphics card. > -Rob >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]