This online bonus material supplements the "Analyze This" feature story in the December 2007 issue of Electronic Musician.
Taming the Wild FFT
A Fast Fourier Transform (FFT) analysis is performed on a block of samples that represent a small chunk of time, and the number of samples in the block is expressed as the number of “points” (data points) in the FFT. The size of these blocks and the sampling rate of the signal determine the frequency resolution of the analysis and how long it takes to accumulate the data for each FFT, called either the “FFT time constant” or the “FFT time” (see Fig. A).
FFT time = N/SR
(N = the number of points selected and SR = the sampling rate of the signal.)
FIG. B: This useful chart translates the properties shown in Fig. A into recommendations on how to select the most appropriate window for your particular application.
For example, the FFT time of a 4,096-point FFT analysis of a 44.1 kHz signal is 93 ms. The frequency-domain analysis produced by an FFT contains half the number of points used in the time domain, so our example would produce N/2 = 2,048 frequency-domain points (called “frequency bins”).
FFT output is linearly spaced; that is, the center frequency of each band is a fixed number of hertz from the next, continuing across the available bandwidth, which, in digital audio, is the Nyquist frequency of SR/2:
FFT frequency resolution = (SR/2)/(N/2)
In our example, that turns out to be 10.77 Hz.
While increasing block size improves the frequency resolution, it also places more strain on the CPU and incurs greater latency. That is because it takes more time to accumulate the larger number of samples in the block. Smaller block sizes produce a response that is more sensitive to short-duration events, while larger block sizes (which represent larger chunks of time) are less responsive but provide some smoothing action and give better low-frequency resolution. Generally, the largest block sizes are used only when very high resolution is needed.
FIG. B: This useful chart translates the properties shown in Fig. A into recommendations on how to select the most appropriate window for your particular application.
One of the more esoteric parameters that users can vary in many analysis programs is the FFT window type. The use of windowing in FFTs is essentially an attempt to bandage a mathematical limitation of the FFT. (Wikipedia''s entry on spectral leakage disputes this argument, but numerous academic sources espouse the point of view given here.)
When a frequency component of a signal has a convenient mathematical relationship to the block size of an FFT (such that an exact number of periods [period = 1/f, where f = the frequency] fits exactly within a single block), life is nice. A sine wave at that frequency is displayed by the FFT as a single line at the proper frequency (see Fig. B). If, however, a fractional number of periods fit in a block, the math works out less neatly, and the component shows up in the display as a narrow band of components instead. This problem is called “spectral leakage,” and given the complexity of real-world signals, there is no way, in practice, to fully eliminate it.
Applying a window function to taper the signal before the FFT analysis can reduce it. But each window function makes its own trade-offs between the reduction of spectral leakage, the accuracy with which it depicts levels, and the resolution of the frequency. The most commonly used window is the Hanning (“Hann” or “von Hann”). Others include Hamming, rectangular, Blackman-Harris, Blackman, and triangular, to name a few.
The accompanying charts, lifted from the extremely informative Inspector XL manual, show the trade-offs of some different window types.