[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [office-comment] office-formula: FFT functions?
Thanks for your thoughts, Rob. On Mon, 4 Feb 2008, robert_weir@us.ibm.com wrote: > This is an interesting idea. In practice,do you think this makes sense > as spreadsheet array functions which take an array and return an array? > Or as an application comment that prompts the user to select an input > and output range,along the lines of what Excel does with its Analysis > Toolkit? The advantage of automatic recalculation seems great enough to me, for things like filtering data or finding correlations, to weigh strongly in favor of making it a full-fledged array function. > My experience is that most users are not even aware of array functions > in Excel and how they use. On the other hand, most users are not aware > of FFT's either. I think it's reasonable to expect that any user who wants to use an FFT should know what an FFT is, and as a consequence they know it will output an array. In any case, the specification of the function is somewhat orthogonal to its user interface (implementations may choose to have some kind of "wizard" where you are prompted to select input and output ranges or something). > If done as a spreadsheet function, the FFT would obviously need to be > recalculated every time one of the cells in the input range changed. > At O(N log N) this isn't so bad. > > Do you know if there are any ways to calculate an FFT with incremental > changes? So I calculate once in O(N log N) and then change only some of > the input cells, a small portion of them -- is there a much faster way > of calculating the new results? In general, if you only update K of the input cells, then you can recompute the outputs in O(N log K) time (using "pruned FFT algorithms"). In practice, the savings in the log are small enough that, unless K is very small compared to N (< N/100 and for fairly large N), I find that they don't outweigh the loss in the constant factor that you get by (partially) abandoning highly-optimized canned FFT routines. (I don't know of any super-optimized pruned FFT routine.) There is also a simple way to update the output via the O(N K) naive definition, which is reasonable if only a couple input cells change. However, I would tend to suggest just sticking with the O(N log N) full FFT whenever it needs to be updated. First of all, this should be plenty fast in most cases -- on my Intel Core Duo, a 4096-point FFT takes under 40 microseconds with an optimized FFT (and even a very simple implementation should get within a factor of 10 of that); users are unlikely to even notice anything under a millisecond, and if they have million-point FFTs in their spreadsheets they shouldn't be surprised if it takes a second to update. Second, using the same algorithm in all cases is obviously a lot simpler to implement, and pruned FFTs especially are kind of a pain. Third, unless you are very careful, just doing the FFT every time is likely to be much more accurate: FFT algorithms have an O(log(N)) error bound for the floating-point error, whereas getting the same bound with the naive formula requires you to play games in how you do the summation. > What functions would you recommend? Maybe FFT() and the inverse IFFT(). > Each one might have a parameter that specifies the normalization > convention. My suggestion would be to just stick with the Matlab convention for normalization (no normalization for FFT, and a 1/N normalization factor for the IFFT). This choice has the huge advantage that a convolution of A and B is given by IFFT(FFT(A) * FFT(B)), which is not true for other normalization choices. If you let people choose the normalization, often they screw this up. Other than for convolutions/correlations, it's actually pretty rare in practical applications that one cares too much about the normalization, but if the user cares they can always write FFT(A) / 100.0 or whatever they want, right? After all, this is a spreadsheet, and multiplication by constants is a pretty familiar spreadsheet operation. No need to have a special parameter in the FFT function just to do that. Occam's razor: Entia non multiplicanda sunt sine necessita. > Maybe also a Power() function? I could imagine in the time > domain CrossCorrelation()and AutoCorrelation() would be useful as well. > These are interesting for fields beyond signal processing, including > economics. My inclination would be against a PowerSpectrum function -- if the user wants that, they should know enough to do Magnitude(FFT(A))^2 or whatever the syntax is in OpenFormula (I'm assuming you have a complex-magnitude function). Similarly for the phase, assuming you have a complex-argument function. (With this sort of thing, there are enough choices in normalization/units, whether you want the magnitude or its square, etcetera, that you are better off letting the user type the few extra characters to specify exactly what they want. Anyone who wants to compute a power spectrum should know what the FFT is, otherwise they won't be able to make sense of the result anyway. And again, entia non multiplicanda sunt.) However, I think that Correlation(A,B) and Convolution(A,B) functions would be very useful, so that you don't require people to understand the connection to FFTs in order to compute these. (Besides, they are trivial to implement once you have FFT and IFFT. Moreover, this way the user never has to see complex numbers if the data are real, and also the implementation can use fancier algorithms to improve the performance in that case.) I would argue against AutoCorrelation(A), since people can just do Correlation(A,A). Entia non multiplicanda sunt again. Probably one should also provide an FFTshift function, like fftshift in Matlab (www.mathworks.com/access/helpdesk/help/techdoc/ref/fftshift.html), to recenter the origin, since a lot of users would want this and it is kinda error-prone to implement properly yourself. And probably a function FFTfrequencies to output the set of frequencies given the sampling interval, since users always ask about this and inevitably screw it up. Basically, the principle I would tend to recommend would be to provide additional functions for things that are error-prone to implement by hand, or require some special understanding of the algorithms/transforms, or which enable large potential advantages in the implementation. > Note however that it is always a debatable point on whether a particular > feature request should be placed in the standard, or whether it is best > done as an extension. I'd be more encouraged to add features like this > to the ODF standard if we had one or more vendors committed to > supporting this feature in their product. I agree that you have to be careful of feature creep. However, there are several reasons why I think you should look hard at standardizing FFT functionality: * FFTs are important for a huge range of problems; it wasn't named one of the "top ten algorithms of the century" for nothing, and because it inputs and outputs simple arrays of numbers, many of its simpler applications are reasonably well-suited to spreadsheets. * It's very hard for users to implement themselves (nigh impossible if they restrict themselves to the spreadsheet language, I would think). * It's very likely to get implemented in some form in spreadsheets anyway (as Excel already has), but if you leave it up to the vendors they will create a mess. They will implement incompatible normalization conventions, sign conventions, output-ordering conventions, etcetera, some of them will only support certain values of N, some of them will implement it as an array function and some not, some of them will output just the DFT and some (like Excel's plug-in) will spit out additional columns of extraneous data like a set of frequencies. Even though you will have common functionality, spreadsheets using this functionality are extremely unlikely to be portable. Given a reference implementation of the FFT for arbitrary N and of the correlation/convolution functions, it should hopefully be straightforward for OOo etc. to implement. But I'm not sure how to proceed in getting the kinds of commitments you would like. (I would be happy to provide a reference C or C++ implementation under a permissive license like MIT or 2-clause BSD ... the implementation here can be simple because applications like spreadsheets where you only recalculate FFTs occasionally are not as demanding of FFT performance as doing DSP or solving PDEs etc, although one could optionally link to an optimized FFT like FFTW or Intel MKL for better performance.) Regards, Steven G. Johnson
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]