TALK:FFTW

Computing: Software / Free and open-source software Low‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Low	This article has been rated as Low-importance on the project's importance scale.
	This article is supported by WikiProject Software (assessed as Low-importance).
	This article is supported by Free and open-source software (assessed as Low-importance).

Link to Category:FFT_algorithms

I would like this page to mention which algorithm is used by FFTW. This could include a link to "Category:FFT_algorithms" (I don't know how to make this link properly)

This isn't an FFT algorithm, per se, it's an implementation of many FFT algorithms. Also, according to the FFTW Documentation, FFTW uses a variety of algorithms depending on the length and type of input. Charles Simpson 15:33, 1 December 2006 (UTC) reply

I've clarified the description of the algorithms used in FFTW in the article. (I normally don't edit this article, but since this was a simple question of fact that is well supported by published references, it seemed like WP:COI did not apply.) — Steven G. Johnson ( talk) 05:27, 29 December 2009 (UTC) reply

Bernstein "controversy"

I'd never heard of Dan Bernstein until I looked at this page. It seems to me that the mention of his "controversy" is redundant on this page. The link is to a web page of Bernstein's (i.e. not an article in a journal, or an independent web page) and having studied his library (and being well aware of FFTW), I am inclined to delete all mention of Bernstein and his library. Here are my reasons: unless someone can link in some third party sources on the controversy (i.e. not Bernstein's own material) I think the inclusion of the controversy violates NPOV. The Bernstein library hasn't been updated since 1999 and it is limited to powers of 2 lengths. Also, from reading its code, it seems to depend on providing a different function for each size of array. Great to use if your data array is a fixed size (and a power of 2) but no good if you sometimes want to pass a 256 element array and sometimes a 512 element array. FFTW in contrast, handles arbitrary sizes with a single call. Anyone else have a view? Sangwine 22:01, 12 December 2006 (UTC) reply

Do not delete this valuable historical information. The Bernstein-FFTW "controversy" was more of a cowboy feud and many records documenting it still exist on the Internet, search for them. The facts of the feud were that Berstein's FFT code, even when you account for the out-of-order output, was faster at the time (1999) and that the FFTW guys refused to acknowledge this. Was it just academic disrespect or was it a coverup or was it just a bit too much cowboy mentality? It was probably a bit of all three. In any case, it is important FFTW folklore. Your comments about Bernstein's code being limited to powers of 2, while true, is completely irrelevant to this discussion. ( Requestion 16:47, 20 December 2006 (UTC)) reply

The fact that djbfft only handles a few special cases is very relevant to this discussion. IMHO, some usenets posts by a disgruntled competitor don't deserve citation from the main FFTW page. Jon Harrop 11:05, 8 April 2007 (UTC) reply

As one of the FFTW authors, I have avoided this page (due to WP:COI) and don't really want to reopen an old flamewar, but I feel I should comment here. The facts which are undisputed, as far as I can tell, are:

Bernstein wrote (circa 1997) an code he called djbfft which computes the DFT with permuted output or input, i.e. the DFT followed or preceded by a permutation matrix, for a few 1d power-of-two sizes. (In-order output is irrelevant for certain applications.) djbfft was tuned primarily for the Pentiums and UltraSPARCs of that time.

When compared to FFTW at the time, which computes the DFT with in-order output, djbfft was faster on the machines he tuned for (e.g. the original Pentium and the UltraSPARC-I). This does not "account for the out-of-order output," and to my knowledge benchmarks of djbfft plus reordering of the output have never been released.

Although we felt (and feel) that djbfft solves a different problem than FFTW and other in-order FFT codes, we nevertheless thought it would be interesting to include in our benchmark graphs. However, we screwed up: we compiled djbfft with the same set of optimization flags as the other codes in our benchmark, including FFTW, whereas djb wanted people to compile with slightly specialized flags (e.g. djb preferred -O2 rather than -O3), and as a result djbfft was slowed down by 10-15% or so if I remember correctly. Bernstein (rightly) criticized us for this mistake, although I think he unfairly imputed malice. We subsequently removed djbfft from the benchmark entirely (at Bernstein's request).

To show the importance of the out-of-order vs. in-order transform computation and of specialization for fixed small sizes and strides, we released a library called "pfftw" on our web site which computes fixed-size, out-of-order DFTs of power-of-two consecutive data only, similar to djbfft. Using Bernstein's own benchmark numbers and compiler flags, pfftw beat djbfft on many (most?) of the platforms he tuned for (e.g. Pentium & UltraSPARC-I).

Subsequent to our release of pfftw in Nov. 1999, Bernstein has neither updated djbfft (its last release was in Sep. 1999) nor has he updated his benchmark or the claims on his web site.

DJB also had some other criticisms, e.g. he doesn't like the way we plot our graphs (he prefers tables of raw timing numbers whereas we find a speed rescaled by N log N more readable), but on these matters we continue to disagree. The claim that we have "refused to acknowledge" djbfft is false (we have linked to the djbfft website since 1997).

(Nowadays, both djbfft and pfftw are trounced on Intel processors by anything that uses SSE/ SSE2 instructions, such as recent FFTW versions or the Intel Math Kernel Libraries.)

To what extent Wikipedia should comment on this I leave up to others. However, I'm not sure that it's fair to simply link to DJB's page on the issue, which is (indisputably) one-sided and (arguably) out-of-date.

—Steven G. Johnson 20:29, 20 December 2006 (UTC) reply

OK, that comment (by Steven Johnson) is helpful. I don't see exactly how to resolve this, so I have flagged the controversy part of the article with what seems to me to be the appropriate tag (self-published source). If anyone can cite some independent material or add a third or fourth opinion, then maybe this issue can be resolved. Sangwine 16:09, 22 December 2006 (UTC) reply

I don't see what needs to be resolved. We have Bernstein's opinion and we have Johnson's opinion. Surprisingly both opinions are more similar than they are different. It isn't important who is right or wrong in this feud and it really isn't Wikipedia's place to decide either. What is important is that this controversy was real and not some made up fiction. We have two great sources of information straight from both of the horses mouths and I think this is about as resolved as it possibly can get. We need to remember that the purpose of Wikipedia is to store a truthful record of history and sometimes parts of that history will be unflattering. What I worry about is what happens when history is deleted? Some of the mailing lists on the Internet that documented this event have been purged. What happens when they are all purged and Bernsteins pages are deleted? My hope is that Wikipedia retains an accurate record. ( Requestion 17:46, 22 December 2006 (UTC)) reply

How does the article include my opinion? (Remember also that Wikipedia is not intended as a source of material not available elsewhere, and in fact this would be against policy.) —Steven G. Johnson 17:48, 24 December 2006 (UTC) reply

Given that (a) there are no articles for individual FFTW versions, and (b) these data applied to a long-deprecated version of FFTW (and of another library that is no longer updated), would it be prudent to shuffle these away somewhere under ==Previous versions== or ==History== or some such? → ɛɹin ♃ 15:03, 17 July 2008 (UTC) reply

I'm removing the controversy section, mostly because of the ugly template. If anyone wants to include this material, then feel free to supply reliable third-party sources for it. Is this controversy notable? Then surely some third party has taken it up. Otherwise, I'm removing outdated self-published material as is the policy of the english wikipedia. Thanks, Vesal ( talk) 11:52, 22 August 2008 (UTC) reply

Technical description rewritten

I have rewritten the technical description of FFTW:

1) Collected the description of the (wide range of) transforms that FFTW supports to the beginning of the technical description.

2) Add more information/use wiki-refs: Complex number, Big-O notation, prime factors, power of two, heuristics, etc.

3) The expression 'particularly strong feature' has been removed, since the truth of this evaluation depends on the needs of the FFTW user and since users may find other features in FFTW to be 'particularly strong' (e.g. the support for non-power of two arrays or the use of SIMD operations). However, since many users may benefit from this interesting measuring-feature, a significant part of the description is dedicated to this topic.

4) The FFTW term guru is defined at the beginning of the last part of the technical description.

5) Highlight FFTW terminology: wisdom, guru.

Out-of-order transforms ?

At least some FFT algorithms permute the transformed elements (by bit-reversing their index), thus unpermuting them incurs an overhead.

When I started working with FFTs (on a CM-200 15 years ago), this overhead was large enough that the FFT library supported Out-of-order transforms, i.e. transforms where the transformed elements where returned to the caller in the permuted order.

In many cases such an Out-of-order transform requires no extra coding by the caller, e.g. when two arrays are transformed, then multiplied elementwise, and then transformed back.

Since FFTW seems to support all other possible (and impossible :-) variants of the FFT, it is natural to wonder why Out-of-order transforms are not supported.

I googled fftw.org to no avail, so can anybody here answer this question ? (I notice an FFTW author has been active on this page)

Thanks, Lklundin 23:09, 30 March 2007 (UTC). reply

FFTW's algorithms are implemented in such a way that a separate bit-reversal stage is not required. For out-of-place transforms, the output is directly written in the correct order. For in-place transforms, the permutations (a sequence of small transpositions) are generally interleaved/combined with the intermediate transform steps. There is still a price paid relative to what we could do with an out-of-order transform if you allowed us an arbitrary permutation, but it is not so easy to measure how big this price is because it's not simply a matter of omitting a certain stage of the transform—it would have to be implemented in quite a different fashion. We do plan to address this in the long run, but not by supporting permuted output—we feel that the Right Thing is to directly provide convolution routines (which is what permuted output is good for), since then we can use a number of tricks beyond just permuted output to speed things up.

In general, the philosophy of FFTW is to obtain as much performance as possible without sacrificing generality (e.g. we lose something by supporting non-consecutive "strided" data, relative to e.g. the Intel IPPS which specializes all code for fixed strides) and while not shifting the burden to the user by making non-trivial changes to the transform that is computed.

—Steven G. Johnson 16:45, 9 April 2007 (UTC) reply

Thanks for providing this insight into the inner the workings of and future plans for FFTW.

As far as I understand the explanation, both the in- and out-of-place transforms manage to do the bit-reversal on the fly, i.e. an O(n) write/read sequence is avoided (or at least managed with temporal locality). Good.

I suspect however, that one (probably the final) O(n) write sequence happens with some loss of spatial locality (due to the sequence of transpositions) and that this loss of locality increases the memory access time. I would expect this to be especially noticable in a multi-processor MPI or ccNUMA (e.g. AMD Opteron) context.

I am therefore pleased to learn of the plan to support convolutions without the overhead of transpositions and having already with interest studied the existing FFTW API I am also looking forward to seeing how the transposition-free convolutions will allow anti-aliasing (and DC-removal).

AFAIK, the information provided on this talk page should not in itself be used in the article. As such I would prefer to update the mention of out-of-order transforms once an actual source (e.g. fftw.org) provides information about overhead (or support).

Lastly, I will take the opportunity to thank the authors for providing the open-source community with such a well-performing library.

Lklundin 20:47, 11 April 2007 (UTC) reply

On a distributed-memory machine with MPI, quite different code/algorithms are needed, since you need to do explicit message passing. FFTW's MPI transforms do support out-of-order output, precisely because the communications (a distributed transpose) are so expensive in this case. This is documented in the manual. The information on how FFTW avoids a separate bit-reversal step, for both in- and out-of-place transforms, is in the Proc. IEEE paper. —Steven G. Johnson 22:09, 11 April 2007 (UTC) reply

I have changed FFTW does not support to FFTW has limited support. I believe the limited is needed because: 1) The permutation of the result is (documented to be) undocumented, 2) Large prime sizes are not supported by the MPI version of FFTW, 3) The user would need the MPI runtime environment even when running on a uniprocessor.

Regarding your statement about the need for other algorithms in an MPI context I would hope that FFTW applies its usual (and useful) method of algorithm choice based on measurements or estimates since the use of MPI alone would hardly guarantee that a switch from the optimal uniprocessor algorithm will actually be faster.

On that note I wonder if I could interest you in an article comparing two Conjugate gradient method algorithms on more than a dozen very different MPI platforms. Even though the algorithms have nothing to do with FFT I believe the article demonstrates that the FFTW approach for algorithm choice is also very useful in an MPI context:

Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær (1999). "Parallel computation of rotating flows". Mathematical Modelling and Analysis. 4: 124–134.

-- Lklundin 18:45, 17 April 2007 (UTC) reply

Known to be the fastest ?

The main article includes the sentence 'FFTW is known as the fastest free software implementation of the Fast Fourier transform (FFT) algorithm'.

Although the FFTW web-site lists extensive benchmarks to support this claim, the sentence may still need some improvement, since the passive voice ('is known') is listed under Wikipedia:AWT.

An anonymous at 67.160.224.159 has twice made unexplained, (poor grammar) changes to the sentence, but I find these changes too negative, especially in the light of the provided benchmarks. I have therefore reverted these two changes, hoping for something better.

Lklundin ( talk) 17:51, 6 February 2008 (UTC) reply

Actually, our web site only states that FFTW is "is typically superior to that of other publicly available FFT software" (emphasis added). The question of the "fastest" program is somewhat ill-posed, because it depends on the transform size and the machine (and to a lesser degree on the compiler etc.), and there is no single program that is the fastest under every circumstance. However, we obviously tried to be the fastest as often as possible, and when we are not the fastest to at least be close, and I think our benchmarks testify to our success in this regard. (On x86, it is usually the fastest by a wide margin compared to other free FFTs, since other free FFTs do not fully exploit SSE/SSE2, with the exception of CMU's somewhat specialized-use SPIRAL code generator.)

That said, I'm not completely sure how to solve your problem here. You can, of course, cite our benchmarks, or the benchmarks and similar claims in our Proc. IEEE paper which was published in a refereed journal (and hence more reliable by Wikipedia's standards), but of course it is fair to point out that these publications are by the authors of FFTW. —Steven G. Johnson ( talk) 18:14, 6 February 2008 (UTC) reply

Cite the referreed journal. Change the article to use language that reflects that of the journal, being sure to avoid sillinesses like "fastest". 84.12.134.165 ( talk) 12:26, 1 July 2008 (UTC) reply

Perhaps saying "one of the fastest" would have been better than just "fastest", although it doesn't actually say much. 78.1.98.43 ( talk) 11:58, 4 January 2011 (UTC) reply

I'm pretty sure the claim of FFTW being the fastest package is restricted to the Western region of the US. We see a significant improvement using NVidia's packages and architecture. There is something to be said about architecture that has a significant effect on real world performance.-- Frozenport ( talk) 14:59, 6 December 2013 (UTC) reply

The "West" in the FFTW title is whimsical ( see the FAQ). And FFTW is only claimed to "perform well on most architectures"; there are certainly specialized architectures where it does not run at all (e.g. GPUs) or architectures where it performs poorly (e.g. embedded CPUs lacking floating-point hardware). — Steven G. Johnson ( talk) 15:43, 6 December 2013 (UTC) reply

link to User:Stevenj not appropriate?

It doesn't seem appropriate for the article to link to my userpage; doesn't that violate Wikipedia:Self-references to avoid? —Steven G. Johnson ( talk) 00:12, 6 November 2008 (UTC) reply

correction

FFTW ... is a software library for computing discrete Fourier transforms (DFTs), developed by Matteo Frigo and Steven G. Johnson at the MIT Laboratory for Computer Science.

Matteo Frigo was at the MIT Laboratory for Computer Science (LCS) when FFTW was first developed, but I was in the MIT Physics Department. (I'm currently in MIT's Department of Mathematics, and Matteo has had several different positions, so there is not presently any direct association between FFTW and LCS/ CSAIL.)

—Steven G. Johnson ( talk) 05:28, 5 December 2008 (UTC) reply

External Links

I'm not sure if Wikipedia is the place for this, but I wanted to know if it would be appropriate to also add an external link showing the basic C implementation of the FFTW libraries. I have been using my implementation of FFTW for a while now at Griffith University, Australia for Signal processing and I constantly have students asking me how to implement the most basic FFT and IFFT in C. And I thought since Wikipedia is usually their first point of call when researching, could it also point them in the right direction for implementation. Something like: C Programming: The reconstruction of the Fast Fourier transform of a real signal using FFTW libraries for the most basic implementation, or something similar.

—Roger Chappel ( talk) 19:48, 17 June 2011 (UTC) reply

Link to MIT Technology Licensing Office now redirects to generic webpage

Hi all, I noticed that, currently, reference 5 (which is the source for how "[FFTW] is also licensed commercially (for a cost of up to $12,500) by MIT") as-linked ( http://technology.mit.edu/technologies/15054_fastest-fourier-transform-in-the-west-fftw-version-3-3-x-fftw-v-3-3-x) redirects to a general landing page. Should we update the source for that information to the MIT TLO's current page on FFTW? The latest link is: https://tlo.mit.edu/technologies/fftw-fastest-fourier-transform-west.

Alternatively, we could add a link to an archived version of the old web page. (I'm new to Wikipedia, so I'm not sure if one option is preferred over another.) Here is the Internet Archive link: https://web.archive.org/web/20170128024924/http://technology.mit.edu/technologies/15054_fastest-fourier-transform-in-the-west-fftw-version-3-3-x-fftw-v-3-3-x. This is the latest archived copy of the webpage that doesn't redirect the same way it does now.

Any guidance? Thanks in advance!

— QB2k ( talk) 11:54, 3 February 2023 (UTC) reply

@ QB2k: Both are valid options. In this case, I prefer to update the link to https://tlo.mit.edu/technologies/fftw-fastest-fourier-transform-west because it shows that the statement is still up-to-date. Dexxor ( talk) 10:22, 5 February 2023 (UTC) reply

Thank you very much for the help, @ Dexxor! I agree with your suggestion. I'll go ahead and make the change.

— QB2k ( talk) 12:20, 7 February 2023 (UTC) reply

Link to Category:FFT_algorithms

Bernstein "controversy"

Technical description rewritten

Out-of-order transforms ?

Known to be the fastest ?

link to User:Stevenj not appropriate?

correction

External Links

Link to MIT Technology Licensing Office now redirects to generic webpage

Link to Category:FFT_algorithms

Bernstein "controversy"

Technical description rewritten

Out-of-order transforms ?

Known to be the fastest ?

link to User:Stevenj not appropriate?

correction

External Links

Link to MIT Technology Licensing Office now redirects to generic webpage

Videos

Websites

Encyclopedia

Facebook