Conventional notation has allowed performers, composers, and theorists to discuss music for centuries. More recently, people have used images of electronic and digital signals to help facilitate those discussions. Signal views are useful when attempting to examine a sound's spectrum or internal structure. Although the ability to view the inside of a sound may have been an absurd notion a century ago, technology has evolved to a point where viewing the makeup of a musical signal is a regular occurrence in recording studios and electronic-music labs. In this column, I will present an overview of some of the “microscopes” that are commonly used for looking at the inside of music today.
CATCHING THE WAVE
In his seminal work On the Sensations of Tone in 1855, Hermann Helmholtz, a scientific renaissance man, studied and represented complex waveforms. His only tools were resonators that he had invented, trigonometry, and a pencil and paper. Representational techniques improved by the end of the nineteenth century with the advent of the Electronics Age. Spark-gap oscillators produced electrical current that flowed in accordance with simple harmonic motion — quick movement that flows from one direction to the other and then back again. This alternating current produced a corresponding magnetic field, which induced a corresponding electrical signal in a nearby conductor. The transmitted signal could be amplified and sent to a loudspeaker, and thus the radio broadcast was born. Along with the birth of the radio broadcast, however, came the problem of creating and maintaining better broadcasting equipment.
FIG. 1: An oscilloscope displays a time-domain view of a waveform.
In 1897 Karl Ferdinand Braun invented the cathode-ray oscilloscope, which enabled electrical signals to be viewed in real time (see Fig. 1). The oscilloscope display was created with a horizontal-sweep oscillator, which produced a sawtooth signal. This acted as a guide for the scanner to sweep linearly from left to right, thousands of times per second, plotting a glowing dot as it went. The sawtooth oscillator later became the basis for television displays, and the oscilloscope and other testing devices became the key components of an electrician's toolkit. Square- and triangle-wave oscillators were created for testing purposes. Feed a square wave into an electrical system, then view the system's output with an oscilloscope. If it does not look square, then there is a problem with the wave. Decades later, those same oscillators became the raw sonic material in analog synthesizers.
Using a Web search engine, you can find a variety of oscilloscope demonstration applets. One example of an applet is The Virtual Oscilloscope, (www.virtualoscilloscope.com), created by Peter Debik; another example, created by Professor Fu-Kwun Hwang, can be found at National Taiwan Normal University (www.phy.ntnu.edu.tw/java/oscilloscope/oscilloscope.html).
IT'S JUST A MATTER OF TIME
Oscilloscopes create time-domain images, meaning that information is displayed as a function of time. Air-pressure changes of acoustic events are converted to changes in current. The current is input to the oscilloscope, which shows pressure/current changes along the vertical axis and time along the horizontal axis. Although time-domain views of waveshapes may be interesting, they don't tell us much when it comes to examining complex waveforms.
While investigating the propagation of heat in 1815, French mathematician Jean-Baptiste Fourier introduced a theorem that became a keystone in wave analysis. The Fourier theorem states that all complex periodic waves are the sum of a set of sinusoidal waves. The frequencies of these waves are all harmonically related, meaning that they are all integer multiples of the lowest frequency (called the fundamental). Each frequency has its own relative amplitude and phase with respect to others. One notable feature in Helmholtz's book is his premise that since musical (pitched) tones are periodic, the Fourier theorem was an effective starting point for studying them.
Perhaps you have seen the rainbow pattern of colors that appears when light shines on a prism. That rainbow pattern appears because white light is composed of several different frequencies (which we perceive as color). Passing the light through the prism causes each frequency to refract by a different amount (due to the differing wavelengths), and thus they separate and can be viewed individually. A Fourier analysis (or transform) is a mathematical procedure analogous to a prism for any complex periodic waveform. The spectrum of a waveform is often a much more useful view than the time-domain waveshape produced by an oscilloscope.
FIG. 2: The frequency content of a time-domain view is not always easy to determine. A spectral view shows the fundamental frequency to be at full amplitude, the third harmonic at one-third this amplitude, and the seventh harmonic at one-half this amplitude.
The combination of simple waves, all at their respective amplitudes and phases, forms the spectrum of a complex wave. A spectral domain (or frequency domain) view of a wave shows amplitude as a function of frequency, meaning that frequencies are represented along the horizontal axis and their respective amplitudes are shown on the vertical axis (see Fig. 2). Because phase is not a significant audible factor in unchanging waveforms, it is generally not represented on spectral plots.
THE FAST AND THE FOURIER
The Fourier transform, however, is a theoretical procedure and tends to have certain drawbacks when applied to real-life situations. For one thing, it is a mathematically intensive process and is not something that can be simply programmed into a cathode-ray system such as an oscilloscope. With digital technology, a sampled waveform can be analyzed with a discrete Fourier transform (DFT). The DFT is also computationally expensive and cannot produce spectral displays in real time. A more efficient algorithm, called the fast Fourier transform (FFT), was created in 1965. With some restrictions, (for example, the number of samples in the signal must be a power of two), the FFT offers tremendous improvements in speed and allows spectra to be displayed in real time.
Fourier transforms are used in many scientific fields. If you have waves of any kind (electromagnetic, heat, fluid, and so on), sooner or later you'll need a spectrum. When used for music, however, the Fourier transform turns out to have a few drawbacks. For one thing, the transform works for waves that are perfectly periodic, but music is, at best, quasi-periodic. Most sounds produce noisy, irregular transients at their onset followed by periodicity (pitch).
In an attempt to capture the changes in frequency over time, the material is analyzed piecemeal, in short segments or windows. A generous estimate for transient times is 12 ms (368 samples of CD-quality audio). To capture attacks effectively, window sizes generally vary from 128 to 1,024 samples. The transform turns each sample into a filter: a window of n samples becomes n fixed-bandwidth filters, with center frequencies at multiples of the sampling rate divided by the window size. This ratio is also equal to the bandwidth of each filter. But there is some overlap among the bands, and frequencies that fall between the filter center frequencies confuse the transform. These “in-between” frequencies get stripped out and are moved over to the output of neighboring filters. This is called spectral “leakage” or “clutter.” The shorter the window, the fewer filters there are, and the more leakage they produce. Longer windows have more filters and less leakage.
Another problem is that transients get “smeared” over the length of the window, which makes them lose their precision. This is called localization blur. Localization blur is reduced with shorter window lengths. Shorter windows, however, produce a less accurate view of the frequencies in the spectrum. Window size is a tradeoff: longer windows result in increased spectral resolution; they also, however, result in diminished time resolution. Shorter windows produce increased time resolution; they also, however, result in diminished spectral resolution.
THROUGH THE LOOKING GLASS
The inaccuracies of spectral leakage and localization blur are minimized by “arithmetic preheating.” A set of samples is not simply lifted from the signal and then analyzed. Normally they are all multiplied by a symmetrical window function before the analysis (see Fig. 3). When every sample in the window is multiplied by a corresponding value in a window function, the excerpt appears more periodic to the transform, resulting in an analysis that is a better approximation of the signal's actual frequency contents.
Most programs offer a choice of window functions, with names such as Hamming, Blackman, or rectangular. A rectangular window represents no window function being applied at all, though the differences among the others is often subtle. Choose anything other than rectangular, and you should get a good spectral rendering.
FIG. 3: Each sample in a windowed excerpt is multiplied by the corresponding entry in a window function. The result more closely resembles a periodic wave and produces a more accurate spectral analysis.
WHITER SHADES OF GREY
When you make changes to EQ settings, you are fine-tuning the spectrum of your signal, adding or subtracting from different spectral regions. Some meters give you the option of a dynamic EQ display. This is analogous to motion-picture frames, which are a series of short-time Fourier transforms (STFTs) that are viewed in rapid succession. This can be an invaluable visual reference when you're mixing. Does your material have too much high end? This is easy to see on a dynamic EQ. Some multiband EQ plug-ins include a graphic frequency response. Equalizing, however, is analogous to adding spice to food — it's best when done in moderation.
FIG. 4: This time-varying spectrum shows that the strongest frequency components are the fundamental and 4th harmonic (1.176 kHz). The transient consists of short appearances of what appear to be the 3rd harmonic (882 Hz), the 10th harmonic (2.940 kHz), and the 15th harmonic (4.368 kHz).
Other times, you may want something more static so that you can make a more detailed investigation. Perhaps you're trying out the latest convolution plug-in and want to know what a given impulse response is adding to (or removing from) your material. In this case, a three-dimensional, time-varying spectrum may be what you want. This is essentially a series of STFTs stitched together as a 3-D graph to show spectral evolution over time. Time and frequency are represented on two horizontal axes, and intensity is represented by height. Fig. 4, which was created using Steinberg's WaveLab, shows a time-varying spectrum of a vibraphone playing the note D3 (about 294 Hz). Although 3-D displays can be effective for viewing the behavior of individual harmonics, the contours of many envelopes are obscured by the contours of others.
Another time-based representation, the spectrogram (or sonogram), manages to collapse the three acoustic dimensions into two spatial dimensions. Spectrograms show frequency as a function of time, meaning time is represented by the horizontal axis and frequency is represented by the vertical axis. The intensity of a given frequency region is represented by shading, with greater contrast representing greater intensities. Fig. 5 shows a spectrogram of a vibraphone sample created in AudioXplorer. Spectrograms are often used in phonetic analysis. They also can be helpful in targeting potential problems. An intrusive band of noise might show up as a dark horizontal streak. If you viewed which frequencies it spans, you might be able to remove the noise with filtering.
FIG. 5: A close look at a spectrogram shows what looks like low-level noise in addition to the harmonics, particularly within the first 5 ms or so.
LOOK WITH YOUR EARS
The various time-domain and spectral views of musical material are to audio producers what scales are to musicians. They are essential to audio ear training. The illustrations reinforce what you are hearing. Whether you are troubleshooting musical material in a hurry or making a careful study of your work, knowing how to look at what you are hearing can sharpen the decisions that you make.
Mark Ballora
teaches music technology at Penn State University. Special thanks to Kevin Larke and Kurt Hebel for their help in preparing this piece.