Square One: Snap, Crackle, Pop

Image placeholder title
Image placeholder title

FIG. 1: This screen shot shows some of the unpleasant damage you see when transferring old tapes or vinyl records for preservation, repurposing, or archiving. Fixing such damage is the object of audio-restoration software.

Is the saying goes, stuff happens. No matter how hard we try to avoid it, a steady chain of events works to degrade every recording we make (see Fig. 1). Recording devices and analog media introduce noise into a recording, then media break, scratch, and wear. At some point, the accumulation of those effects makes us need to look for ways to restore the original glory of a valued recording. And that's where the art and science of audio restoration comes in.

Various techniques, such as the RIAA curve for vinyl or the various Dolby noise-reduction methods for tape, are employed to improve the performance of recording devices. But audio restoration is used when the damage has already been done and such preventative processing is no longer an option. Analog techniques exist: for example, if you have two damaged source copies, you can splice them together to circumvent the flaws in each. And there are 2-stage analog declickers, which are designed to find clicks using a highpass filter and then reduce them by engaging a lowpass filter.

The use of digital signal processing (DSP) has brought great advances to the field of audio restoration and has simultaneously opened it to a much wider audience. Even very inexpensive software now includes tools to remove clicks and hiss from recordings, making it easier than ever before for users to create good-sounding digital files from their favorite LPs. More-sophisticated versions of these tools are being used to archive and restore recordings of great historical, commercial, and cultural value.

Meet the Enemy

Image placeholder title

FIG. 2: A click is immediately apparent by its contrast with the surrounding music waveform, and intuitively just as easy to fix by interpolation.

The damage that occurs to recordings — whether as a result of the recording/reproduction process or the physical degradation of the medium — can be broadly divided into two categories: short-term damage and long-term damage. Short-term damage includes clicks (see Fig. 2), crackle, thumps, and pops; long-term damage includes hiss and wow, or speed fluctuations. (Distortion is another type of damage, but one that is primarily beyond our current capabilities to repair.)

In some cases the damaged signal consists of the original signal and an undesirable signal, but in other cases the damage obliterates the original signal, leaving nothing but the noise. To fix the former type of damage, you attempt to remove the noise and leave only the undamaged original; to fix the latter type, you try to reconstruct the missing part of the original. In both cases, however, you must first find a way to distinguish the noise from the original signal.

It turns out that the best tools for the job are your ears. Even the untrained ear can easily distinguish between the sound of music and that of tape hiss, clicks, and pops. If you are properly motivated, as when listening to a distant broadcast or a worn-out recording of a favorite musician, you can discard the interference and savor only the original signal. Audio-restoration software attempts to mimic this innate human talent through carefully crafted DSP algorithms.

Divide and Conquer

Several methods are available for detecting clicks, crackle, and other similar noises. Clicks are primarily composed of high-frequency energy, so if you run the signal through a highpass filter, you can remove the useful energy and see only the clicks. Clicks have an extraordinarily short rise time, so you can analyze the waveform with wavelet filters (which see a signal in both the frequency and the time domain), and then flag the points that look like cliffs.

Algorithms exist that can dependably predict the behavior of a music waveform based on its prior shape. When a click occurs, that prediction will prove to be quite wrong. This sort of time-domain modeling of the waveform borrows from the fields of probability theory and random-signal theory. It is one of the major advancements in restoration and continues to be a hot topic of research.

Once a click is found, the simplest fix is to interpolate between the last good sample before the click and the next one after. Some more-advanced algorithms first dissect the signal on a frequency basis (by doing a discrete Fourier transform), and then interpolate within each frequency bin (storage unit) before reconstructing the signal.

In using interpolation, however, you assume that the original signal is unrecoverable. If you believe the signal is recoverable, you can attempt to remove the click. The trick is to know which part of the damaged signal is good and which is bad. One solution is to model the behavior of the click based on the analysis of similar noises. You can then subtract the estimated spectrum of the noise from the damaged signal, leaving only the original.

Making Hiss-tory

Eliminating hiss and other sorts of steady-state noise requires a different set of tools. Although lowpass filtering can reduce the most obvious part of hiss, it often dulls the original signal in the process. To avoid having that happen, hiss-detection algorithms first divide the signal into time slices, typically looking at windows of either 1,024 or 2,048 samples. At each window (or frame), the signal is then separated into frequency bins. If this sounds a lot like the sort of frequency coding that goes into many perceptual data-reduction codecs, it is. What we do with these frequency bins is different, however.

Because voice and music signals typically have fairly predictable patterns of fundamentals and overtones, relatively few of the frequency bins at a particular moment will hold most of the musical energy. The hiss, being unrelated to the music signal, is spread randomly across the frequency bins, leaving the music bins with an improved signal-to-noise ratio compared with the total signal. This makes the job of reducing the noise without affecting the music signal easier.

The most common way to characterize the hiss within a damaged signal is by taking a “noise print” from a silent section, such as the gap between tracks. Other methods may apply generalized noise profiles, while still others make assumptions about the characteristics of the desired signal and adapt the noise profile based on the current total (original plus noise) signal. The hiss is then subtracted at each frequency bin, and the frames are reassembled using an “overlap and add” technique to create the improved output signal.

A Little Birdie

One of the most common side effects of hiss reduction is a warbling sound affectionately known as a “birdie” or “musical noise.” Because the hiss is subtracted on a frame-by-frame basis and it is not identical from frame to frame, the timbre of the residual hiss changes over time, creating a disconcerting semipitched warble. Often the most useful fix is to settle for less hiss reduction, and the majority of hiss-reduction algorithms allow you to vary the amount of reduction.

Some algorithms reduce the musical-noise effect by comparing each frame's spectrum with adjacent frames. Because music and speech most often depend on signals that remain steady over the very short duration of a few frames (whereas noise by definition changes randomly), a clearer distinction can then be drawn between the original signal and the hiss.

Psychoacoustic analysis is helping researchers design ever-more-useful algorithms by fine-tuning the trade-offs between hiss reduction and preservation of the original signal. Following the Hippocratic edict to “do no harm,” you don't need to process those parts of the signal in which the hiss is unnoticeable as a result of masking. This reduces not only the likelihood of birdies and other artifacts, but also the computational load.

That's Not All, Folks

Image placeholder title

FIG. 3: A thump or pop is distinguished from a simple click in that it resonates for a short time. This resonance makes a thump or pop more difficult to repair than a click.

Thumps and pops are both defects that resemble low-frequency clicks except that they resonate for a short time (see Fig. 3). The attack of the thump may obliterate the signal (or nearly so), but as the thump resonates and fades, it is blended with the signal. This requires a more complex model of the defect's behavior than is needed for a click. The modeling is made more difficult because the resonance's pitch changes as the thump fades. Very often the original signal must be reconstructed by interpolation or modeling for the first few samples; the rest can be retrieved (hopefully) by subtraction of the thump.

Wow (the periodic variation in a recording's speed, as when a vinyl record's hole is punched off center) can be addressed by mapping the speed fluctuations, and then performing a variable resampling on the source with a profile that is the inverse of the measured speed map. This is another application in which time slicing and frequency coding are useful. By comparing the dominant frequencies at adjacent frames (assuming once again that voice and music signals ordinarily sustain frequencies over multiple frames), you can track the pitch (and therefore the speed) fluctuations.

All of this analysis takes a lot of computing muscle, thus the first audio-restoration products depended on hardware DSP. Current research into ever-more-complex models for detection and repair banks on a continued increase in available processing power. As a result, you can look forward to more and better tools to restore your valued but degraded audio.

Brian Smithers is department chair of workstations at Full Sail University and the author of Mixing in Pro Tools: Skill Pack (Cengage Learning, 2006).