FIG. 1: This figure shows the impulse responses of two filters. The top track illustrates the impulse, the middle track shows the IR of a narrow bandpass filter, and the bottom track represents the IR of a shallow-slope lowpass filter.FIG. 2: A sequence of impulses (top) is convolved with a sample of a Japanese temple block (middle). The bottom track shows the convolution. FIG. 3: A sequence of temple-block hits (top) is convolved with a cymbal sample (middle), generating a sequence of hybrid “cymblock” sounds (bottom).
Have you ever used a gong as a reverb unit, or listened to a flock of birds singing a violin arpeggio? Did you ever wonder what sound you’d get if you could pour water through a cymbal? What if Howlin’ Wolf had recorded inside a motorcycle engine instead of the Chess Records studio?
Don’t worry, EM has not been taken over by a gang of surrealist poets. But a poet might actually be helpful in evoking the strange and wonderful quality of sonic hybrids (like those I mentioned) that you can produce through convolution. When two signals are convolved, their spectra are multiplied. The output signal partakes of the timbral and temporal attributes of both sources, and convolution coils the signals together inextricably.
Engineers have known convolution as a fundamental operation of digital signal processing (DSP) for decades. However, information about convolution hasn’t yet reached many musicians outside the academic and research communities. We hope to change that a bit, because many of its applications—including reverberation and other spatial effects, filtering, and cross-synthesis—are of interest to electronic musicians.
Convolution and Spectrum Multiplication
Strictly speaking, the term convolution refers to a sample-by-sample operation on two signals; this is called direct convolution. I won’t discuss the details of direct convolution, because it is seldom, if ever, used in the real world and is terribly inefficient. Instead, convolution software usually implements an analysis/resynthesis process called spectrum multiplication.
Spectrum multiplication is mathematically equivalent to direct convolution. The process begins with a fast Fourier transform (FFT) analysis of the spectra of two input signals. The analyzed spectra are then multiplied. Finally, the output signal is resynthesized through a process called inverse FFT (IFFT). This may sound like a lot of computation, but a modern computer processor can scream through a lengthy spectrum multiplication in almost no time.
When the spectra of two signals are multiplied, like frequencies reinforce each other, while unlike frequencies weaken or disappear. This effect is called spectral intersection. In an effective convolution, the two input signals should have at least some energy in a common frequency range. If you convolve piano and clarinet samples, both at middle C, the spectra will have many common frequencies. Odd-numbered harmonics, which are abundant in the clarinet tone, will be strongly reinforced in the output spectrum.
But if you convolve the highest note on a piano with the lowest note of a bass clarinet, you’ll get a rather faint signal because only the clarinet’s weak upper harmonics will intersect with the piano note’s spectrum. Looking at it in another way, you could say that the piano spectrum had “filtered out” the bass clarinet’s fundamental and lower harmonics. Convolution is, in fact, intimately tied to filtering, as you’re about to see.
For the purposes of this article, spectrum multiplication and convolution can be considered as synonymous. From now on, I’ll stick to the simpler term convolution.
Filtering & Impulse Response
In theory, convolution can reproduce the effects of any sort of filter—for example, lowpass, highpass, or even specific vintage filters like the Minimoog voltage-controlled filter (VCF). Before I delve into that sort of thing, though, let’s go over some basic filtering concepts.
Filters are usually characterized by their frequency response curves. One way to obtain a frequency response curve is to pass white noise through a filter. Analysis of the output spectrum shows how the filter attenuates various frequency regions.
You can also think about how filters behave over time. A filter’s impulse response (IR) is a measurement that embodies the same information as the frequency response but views it in the time domain. To obtain the IR, you feed an extremely short impulse, such as a gunshot, into the filter. Theoretically, the ideal impulse—an infinitely short one—would include energy at all frequencies, as does white noise. Engineers use the filter’s output signal (the IR) in order to measure a filter’s response to transients and observe whether it rings, or oscillates, at certain frequencies.
In the real world, the ideal impulse is approximated by a very short transient. Figure 1 illustrates an impulse that was input to two filters, showing the resultant IRs. The top track contains the impulse; below it is the impulse response of a narrow bandpass filter with a center frequency of 1 kHz. The bottom signal represents the IR of a lowpass -12 dB/octave filter that has a cutoff frequency of 1 kHz. These two filters, which have very different frequency responses, also have very different IRs. The bandpass filter has a long IR—typical of filters with a narrow frequency response—and rings for more than 7 ms. Filters with a wider frequency response and a smooth slope tend to have a short IR, as shown by the lowpass output in the figure.
The relationship between convolution and filtering is far from obvious, so you’ll have to take the following key concept on faith:
Axiom 1:The output of any filter is the convolution of its input signal with its impulse response. (See the sidebar “Further Reading” for sources of mathematical proof.)
It follows from Axiom 1 that you could use convolution to reconstruct any filter. To capture the characteristics of the Minimoog VCF, for instance, you would need only to dial up the desired settings on the filter, patch in an impulse source, and record the output. By convolving any signal with the recorded IR, you could obtain the sound of that signal filtered by the Minimoog. The most difficult part of this process would be finding a Minimoog!
There’s a catch, though: your IR recording would capture the behavior of the Minimoog at one particular setting. Convolution is therefore impractical for variable or dynamic filtering, because it doesn’t offer the parametric control available on the average synthesizer filter or equalizer.
If we broaden our view of filters and impulse responses, things become even more interesting. The following truth is, I hope, self-evident:
Axiom 2:Any system through which a signal might pass can be considered a filter.
Many acoustic and electronic systems, though not designed to be filters, have filtering effects. Concert halls, amplifiers, and microphones are a few well-known (and often-cursed) examples of this. When we speak of a concert hall as “bright,” we’re thinking in the frequency domain—that the hall’s resonances reinforce high frequencies. When sound technicians clap their hands to check how live a hall is, they’re thinking in the time domain. The hand-clap test is an informal measurement of the hall’s IR.
Reverb designers sometimes use convolution to study and simulate acoustic spaces. First, an IR recording is made in the space. The impulse source might be an electronic signal played back over a speaker, or an acoustic event such as the firing of a starter pistol. When a test signal (say, an instrumental recording made in an anechoic chamber) is convolved with the IR, the reverberant characteristics of the space are reproduced with remarkable accuracy. So if the test signal is a marimba recording, and the IR was recorded in Boston’s Symphony Hall, the convolution sounds like that marimba being played in Symphony Hall. In fact, a good IR recording and a convolution program are the ingredients you need for an instant reverb unit of sorts. (If you want to try this technique, make sure you get permission before strolling into your local cathedral with a revolver and a portable DAT!)
A twist on this approach to reverb is the dynamic room effect. Suppose you recorded an IR in Symphony Hall and—because you were there anyway—from the hall’s restroom as well. Suppose further that you took these recordings home and made an interpolation (that is, a crossfade) between the two IRs. If you convolved a marimba recording with the interpolated IR, you’d get the sound of the marimba in a room whose shape, size, and construction materials all mutated over time.
A head-related transfer function (HRTF) is a special IR recording made with mics located in the ears of a dummy head. Such a recording preserves effects of sound reflections off the head, outer ears, and shoulders of a listener. These reflections cause short time delays that produce a comb-filter effect, which provides cues to the three-dimensional location of the sound source relative to the listener. (For an introduction to spatial hearing, see “Square One: Lost in Space” in the May 1999 issue of EM.) To simulate these cues over headphones or near-field speakers, signals are convolved with HRTFs, giving the listener the illusion of hearing sounds located in a 3-D space. This technique is often used in computer games, training simulations, and virtual-reality applications.
An easy way to become acquainted with HRTFs is to try the Binaural Processor in Tom Erbe’s Macintosh-based SoundHack (see the sidebar “Convolution Tools”). If you’re technically inclined, a library of HRTFs in raw 16-bit format is available from the Massachusetts Institute of Technology’s Media Lab at sound.media.mit.edu/KEMAR.html. These HRTFs need to be decompressed, converted to a usable audio file format, and convolved with other signals through your software of choice. Check the Web site’s FAQ for conversion instructions.
The Universal IR
If you extend the concept of the impulse response, a quite different range of effects becomes available. You can think about many acoustic sound sources in terms of an excitation/response model. Instruments such as tom-toms, bass drums, and wood blocks are good examples because they consist of resonant bodies that make a sound when struck. The excitation produced by striking the instrument can be considered a broadband impulse. The instrument body, which usually resonates in a narrow range of frequencies, acts as a filter. The sound of the instrument is, in fact, its impulse response. To generalize:
Axiom 3:Any signal can be thought of as the impulse response of a (possibly imaginary) filter.
Figure 2 illustrates how this idea can be used to construct a simple rhythmic sequence. Here the IR signal is a sample of a Japanese temple block (middle track). A sequence of impulses (top track) is convolved with this IR; the bottom track shows the convolution. The result is a sequence of copies of the temple-block signal that is rhythmically identical to the impulse sequence. The amplitude of each copy is proportional to that of the corresponding impulse. In effect, the impulses “trigger” the temple-block sounds.
The convolution track thus sounds like a recording of someone tapping an actual temple block with a stick. This sequence can be varied in several ways. If you were to prefilter each impulse by a different amount, the brightness of the temple-block hits would vary. To vary the timbre even more, different signals could be substituted for, or alternated with, the electronic impulses; samples of drumsticks or claves being struck together would work well here.
Of course, if the sample of the temple block had been loaded into a velocity-sensitive sampler, this particular sequence would have been much easier to create. Figure 3 shows a variation of the previous sequence that would be harder to produce by conventional means. Here the temple-block events (top track) are convolved with a sample of a ride cymbal (middle track). The bottom track is a cross-synthesis of the block and cymbal sounds. This “cymblock,” as I’ll call it, retains the rhythm and accentuation of the blocks but has the overall character of a series of cymbal hits. The cymblock hits overlap each other and sound acoustically realistic. (Notice that convolution can overlap as many events as needed in this way without running out of voices.)
The waveforms don’t show it, but the temple block has a strong resonance at around G5, which intersects with part of the cymbal’s broader spectrum. The common frequencies reinforce each other, and the cymblock takes on the pitch of the temple block. This may be the most interesting feature of this convolution, because there is potential for a “morph.” The original blocks and the cymblock tracks have common rhythmic, accentual, and pitch features; therefore, a careful crossfade can produce an interesting transformation in which the temple block appears to “turn into” the cymblock.
As the cross-synthesis of the cymblock shows, there is really no restriction on what impulses and IRs you choose to convolve together. With a general convolution program, the distinction between impulse and IR is ultimately a mere convention.
If you’d like to experiment with convolution yourself, get hold of some software and start by trying to reproduce the examples shown in Figures 2 and 3. Then branch out by substituting different IR signals. Try convolving temple blocks with a gong, a piano chord, or a string tone cluster. Next, use a variety of impulse signals, such as a cowbell, an open or closed hi-hat, or a cello pizzicato. Impulses needn’t all be percussive, either. Speech, for example, can be an interesting input: try using recorded lines of poetry as the impulse, with a cymbal or snare-drum roll as the IR.
Don’t forget that purely electronic signals can be used to fine effect in a convolution. For a surprisingly good reverb effect, use white noise with a two- to three-second exponential decay as an IR. Filtering the noise colorizes the reverb. Noise with a long linear decay gives an infinite reverb effect. For a truly twisted effect, use an inharmonic FM sound as the “reverb”—if you like the result, you qualify as a hard-core convolutionary.
Even after running through all these suggestions, you will still have only scratched the surface of this fascinating sonic resource’s possibilities. Convolution, unlike FM synthesis, is not a widely explored, well-documented electronic-music technique. But that is one reason why it appeals to the audio adventurer.
John Duesenberry’selectronic compositions are available through the Electronic Music Foundation’s Web site at www.emf.org. If you come up with a really great convolution, e-mail him about it at email@example.com.
Convolution software for the desktop is readily available. For the Mac, Tom Erbe’s SoundHack features both general convolution and a binaural processor that uses built-in HRTFs. You can get a free demo version of this sound-file processor via FTP at music.calarts.edu/pub/SoundHack. I definitely recommend paying a small shareware fee for the PowerPC-native version because it’s much faster. BIAS Peak also implements sound-file convolution; check out www.bias-inc.com for information. A real-time convolver, developed with Cycling ’74’s MSP, is available for free at www.spectralnoise.com. James McCartney’s SuperCollider synthesis language includes a demo program that does real-time convolution in about 20 lines of code. SuperCollider can be found at www.audiosynth.com.
On the PC side, convolution is available in two audio editors, SEK’D’s Samplitude and dissidents’ Sample Wrench. The Acoustic Mirror DirectX plug-in from Sonic Foundry is a powerful convolution engine. Lake DSP’s Huron Digital Audio Convolution Workstation, hosted on Windows NT, comprises dedicated DSP hardware and software for 3-D audio, acoustic simulation, auralization, and other applications. Information and audio examples of the Huron are available at www.lakedsp.com.
E-mu Systems has featured convolution, under the name of Transform Multiplication, in a number of its products. These date back to the Emax sampler, making E-mu a pioneer in commercial implementations. The Emax SE, Emax II, ESI-32, ESI-4000, and all EOS systems also come with Transform Multiplication.
Information about convolution in a musical context is far from being plentiful. Professor Richard Boulanger, one of the first musicians to realize the potential of convolution, published a discussion in the Spring/Summer 1986 issue of Ex Tempore (Department of Music, University of Alberta). This article is loaded with practical suggestions and is well worth hunting down. Boulanger’s upcoming book on Csound (visit mitpress.mit.edu/e-books/csound) will also cover convolution. Curtis Roads’s invaluable Computer Music Tutorial (MIT Press, 1996) treats the subject from both mathematical and musical standpoints. Many textbooks about digital signal processing, such as C Language Algorithms for Digital Signal Processing (Prentice-Hall, 1991), cover convolution from an engineering perspective and include program code.