Data compression is a huge boon to the audio industry. MP3, for example, revolutionized music distribution, and Dolby Digital and DTS allowed 5.1-channel soundtracks to be included on DVDs. The point of these lossy compression schemes (also called perceptual coders) is to discard elements of the audio signal that humans would not perceive anyway. For instance, MP3 throws away a whopping 90 percent of the audio data, resulting in a compression ratio of 10:1.
FIG. 1: Spike coding (red) exhibits much better fidelity than Fourier transform coding (black) at bit rates below 60 kbps.
Still, who wouldn't want to fit more audio in a given amount of storage space or hear the same sound quality at lower bit rates? Researchers at Carnegie Mellon University (www.cmu.edu) are developing new audio coding algorithms that promise much greater efficiency than is possible today.
Instead of removing frequencies deemed imperceptible due to masking and other psychoacoustic effects, as with MP3 and other perceptual coding methods, the CMU algorithms take a different approach. Audio signals are encoded as a series of very short wave packets called spikes, which closely mimic the impulses sent along the auditory nerve from the inner ear to the brain.
Like the inner ear, the CMU algorithms divide incoming sound waves into narrow frequency bands and send short bursts of energy of varying amplitudes at different times. In fact, the spike waveforms closely match the way auditory nerve fibers encode sound, integrating specific frequencies for specific durations, weighted in time by a sharp attack and a slower decay. Interestingly, the spike waveforms were derived independently, after which it was discovered that they closely resemble the filtering properties of the auditory nerve.
One way to evaluate the efficiency of a coding method is to look at its fidelity — that is, how closely the decoded signal matches the original signal — at a given bit rate. Using signal-to-noise ratio as an indicator of fidelity, spike coding performs much better than Fourier transforms at low bit rates (see Fig. 1); perceptual coders such as MP3 perform about the same as Fourier in this regard.
The CMU researchers expect that spike coding will excel at lossless or near-lossless coding, while exhibiting greater efficiency than current perceptual coders at low bit rates. They predict that their approach could also be used for lossy compression, greatly increasing the efficiency of encoding only perceptually relevant information.
Most of the CMU work has been done with two types of sounds: vocalizations, including speech, and natural environmental sounds, including steady-state sounds (such as rain) and transient sounds (such as leaves and branches crunching underfoot). Preliminary research into musical sounds indicates that they can be encoded using algorithms like those optimized for vocalizations, with similar benefits at low bit rates.
The potential applications for spike coding are myriad. For example, it could lead to much better cochlear implants, giving the deaf a far more accurate aural experience of the world thanks to the code's striking similarity to auditory nerve impulses. According to Michael Lewicki, associate professor of computer science at CMU and a member of the Center for the Neural Basis of Cognition, “If we could use a cochlear implant to ‘talk’ to the auditory nerve in a more natural way via our coding, we could quite possibly design implants that would convey sounds to the brain that are much more intelligible.”
In the musical domain, tunes could sound just as good as current encoders allow but at lower bit rates. “We're very excited about this work,” says Lewicki, “because we can give a simple theoretical account of the auditory code that predicts how we could optimize signal processing to one day allow for much more efficient data storage on everything from DVDs to iPods.” This is a laudable goal, and I look forward to hearing it realized.