office-comment message

Subject: Re: [office-comment] office-formula: FFT functions?
From: "Steven G. Johnson" <stevenj@fftw.org>
To: robert_weir@us.ibm.com
Date: Mon, 4 Feb 2008 15:49:23 -0500 (EST)
Thanks for your thoughts, Rob.

On Mon, 4 Feb 2008, robert_weir@us.ibm.com wrote:
> This is an interesting idea.  In practice,do you think this makes sense 
> as spreadsheet array functions which take an array and return an array? 
> Or as an application comment that prompts the user to select an input 
> and output range,along the lines of what Excel does with its Analysis 
> Toolkit?

The advantage of automatic recalculation seems great enough to me, for 
things like filtering data or finding correlations, to weigh strongly in 
favor of making it a full-fledged array function.

> My experience is that most users are not even aware of array functions 
> in Excel and how they use.  On the other hand, most users are not aware 
> of FFT's either.

I think it's reasonable to expect that any user who wants to use an FFT 
should know what an FFT is, and as a consequence they know it will output 
an array.  In any case, the specification of the function is somewhat 
orthogonal to its user interface (implementations may choose to have some 
kind of "wizard" where you are prompted to select input and output ranges 
or something).

> If done as a spreadsheet function, the FFT would obviously need to be 
> recalculated every time one of the cells in the input range changed. 
> At O(N log N) this isn't so bad.
>
> Do you know if there are any ways to calculate an FFT with incremental 
> changes?  So I calculate once in O(N log N) and then change only some of 
> the input cells, a small portion of them -- is there a much faster way 
> of calculating the new results?

In general, if you only update K of the input cells, then you can 
recompute the outputs in O(N log K) time (using "pruned FFT algorithms"). 
In practice, the savings in the log are small enough that, unless K is 
very small compared to N (< N/100 and for fairly large N), I find that 
they don't outweigh the loss in the constant factor that you get by 
(partially) abandoning highly-optimized canned FFT routines.  (I don't 
know of any super-optimized pruned FFT routine.)

There is also a simple way to update the output via the O(N K) naive 
definition, which is reasonable if only a couple input cells change.

However, I would tend to suggest just sticking with the O(N log N) full 
FFT whenever it needs to be updated.  First of all, this should be plenty 
fast in most cases -- on my Intel Core Duo, a 4096-point FFT takes under 
40 microseconds with an optimized FFT (and even a very simple 
implementation should get within a factor of 10 of that); users are 
unlikely to even notice anything under a millisecond, and if they have 
million-point FFTs in their spreadsheets they shouldn't be surprised if it 
takes a second to update.  Second, using the same algorithm in all cases 
is obviously a lot simpler to implement, and pruned FFTs especially are 
kind of a pain.  Third, unless you are very careful, just doing the FFT 
every time is likely to be much more accurate: FFT algorithms have an 
O(log(N)) error bound for the floating-point error, whereas getting the 
same bound with the naive formula requires you to play games in how you do 
the summation.

> What functions would you recommend?  Maybe FFT() and the inverse IFFT().
> Each one might have a parameter that specifies the normalization
> convention.

My suggestion would be to just stick with the Matlab convention for 
normalization (no normalization for FFT, and a 1/N normalization factor 
for the IFFT).  This choice has the huge advantage that a convolution of A 
and B is given by IFFT(FFT(A) * FFT(B)), which is not true for other 
normalization choices.  If you let people choose the normalization, often 
they screw this up.

Other than for convolutions/correlations, it's actually pretty rare in 
practical applications that one cares too much about the normalization, 
but if the user cares they can always write FFT(A) / 100.0 or whatever 
they want, right?  After all, this is a spreadsheet, and multiplication by 
constants is a pretty familiar spreadsheet operation.  No need to have a 
special parameter in the FFT function just to do that.  Occam's razor: 
Entia non multiplicanda sunt sine necessita.

> Maybe also a Power() function?  I could imagine in the time
> domain CrossCorrelation()and AutoCorrelation() would be useful as well.
> These are interesting for fields beyond signal processing, including
> economics.

My inclination would be against a PowerSpectrum function -- if the user 
wants that, they should know enough to do Magnitude(FFT(A))^2 or whatever 
the syntax is in OpenFormula (I'm assuming you have a complex-magnitude 
function).  Similarly for the phase, assuming you have a complex-argument 
function.  (With this sort of thing, there are enough choices in 
normalization/units, whether you want the magnitude or its square, 
etcetera, that you are better off letting the user type the few extra 
characters to specify exactly what they want.  Anyone who wants to compute 
a power spectrum should know what the FFT is, otherwise they won't be able 
to make sense of the result anyway.  And again, entia non multiplicanda 
sunt.)

However, I think that Correlation(A,B) and Convolution(A,B) functions 
would be very useful, so that you don't require people to understand the 
connection to FFTs in order to compute these.  (Besides, they are trivial 
to implement once you have FFT and IFFT.  Moreover, this way the user 
never has to see complex numbers if the data are real, and also the 
implementation can use fancier algorithms to improve the performance in 
that case.)  I would argue against AutoCorrelation(A), since people can 
just do Correlation(A,A).  Entia non multiplicanda sunt again.

Probably one should also provide an FFTshift function, like fftshift in 
Matlab (www.mathworks.com/access/helpdesk/help/techdoc/ref/fftshift.html), 
to recenter the origin, since a lot of users would want this and it is 
kinda error-prone to implement properly yourself.  And probably a function 
FFTfrequencies to output the set of frequencies given the sampling 
interval, since users always ask about this and inevitably screw it up.

Basically, the principle I would tend to recommend would be to provide 
additional functions for things that are error-prone to implement by hand, 
or require some special understanding of the algorithms/transforms, or 
which enable large potential advantages in the implementation.

> Note however that it is always a debatable point on whether a particular 
> feature request should be placed in the standard, or whether it is best 
> done as an extension.  I'd be more encouraged to add features like this 
> to the ODF standard if we had one or more vendors committed to 
> supporting this feature in their product.

I agree that you have to be careful of feature creep.  However, there are 
several reasons why I think you should look hard at standardizing FFT 
functionality:

* FFTs are important for a huge range of problems; it wasn't named one of
   the "top ten algorithms of the century" for nothing, and because
   it inputs and outputs simple arrays of numbers, many of its
   simpler  applications are reasonably well-suited to spreadsheets.

* It's very hard for users to implement themselves (nigh impossible if
   they restrict themselves to the spreadsheet language, I would think).

* It's very likely to get implemented in some form in spreadsheets
   anyway (as Excel already has), but if you leave it up to the vendors
   they will create a mess.  They will implement incompatible normalization
   conventions, sign conventions, output-ordering conventions, etcetera,
   some of them will only support certain values of N, some of them will
   implement it as an array function and some not, some of them will
   output just the DFT and some (like Excel's plug-in) will spit out
   additional columns of extraneous data like a set of frequencies.
   Even though you will have common functionality, spreadsheets
   using this functionality are extremely unlikely to be portable.

Given a reference implementation of the FFT for arbitrary N and of the 
correlation/convolution functions, it should hopefully be straightforward 
for OOo etc. to implement.  But I'm not sure how to proceed in getting the 
kinds of commitments you would like.

(I would be happy to provide a reference C or C++ implementation under a 
permissive license like MIT or 2-clause BSD ... the implementation here 
can be simple because applications like spreadsheets where you only 
recalculate FFTs occasionally are not as demanding of FFT performance as 
doing DSP or solving PDEs etc, although one could optionally link to an 
optimized FFT like FFTW or Intel MKL for better performance.)

Regards,
Steven G. Johnson
References:
- Re: [office-comment] office-formula: FFT functions?
  - From: robert_weir@us.ibm.com