Audio- (as well as video-) data reduction, also known as data compression, is one of the most important media technologies to come along in recent years. Many capabilities that you take for granted — streaming audio, fast music downloads, and DVD surround sound, to name a few — simply would not exist without the ability to reduce audio data to a fraction of its size while retaining most of its fidelity.
But many people have only a vague idea of how those key technologies operate. Other articles in EM have covered the how-tos of compression in various formats. This one examines the principles that underlie audio-data compression in order to help you get the most from the technology. When you know what's under the hood, you're in a better position to understand when using audio-data reduction is appropriate, what the impact will be on fidelity, and how to select the right data-reduction scheme for the application.
Before moving ahead, a disclaimer is in order: it won't be possible to discuss every single audio-data-reduction scheme that has been developed for commercial and noncommercial use, because too many exist. Fortunately, the principles covered apply to nearly all the schemes.
WHY AUDIO-DATA COMPRESSION?
The purpose of reducing audio data is to get a free lunch, so to speak. By collapsing the data in a song or track to a fraction of its original size, you can get more out of any transmission channel or storage medium. Here are just a few benefits audio-data compression has brought:
- fast downloading and streaming of songs and albums from the Internet;
- discrete surround sound in DVD;
- compact optical media such as MiniDisc;
- more channels of recording and playback in multitrack digital audio workstations (DAWs); and
- vastly increased audio storage in CD-ROM and hard disk.
Audio-data compression is mostly used to fit some number of audio channels into a space in which they would never fit as linear pulse-code modulation (PCM) while preserving something more or less resembling CD quality.
There's another way of looking at it, though. When you compress audio, you're actually enhancing the quality of what can be carried by a given channel. For example, games and other computer applications have long used monaural, 8-bit audio at a sampling rate of 22 kHz (equivalent to a bit rate of 176 Kbps, which is frequently used for MP3 downloads) for sound effects and music. The difference is that 8-bit mono at 22 kHz sounds awful; stereo MP3 at the same bit rate, however, is considered highly listenable by most folks. That is, you could have transmitted terrible audio at the same rate, but you didn't. Rather, you used data reduction technology to improve fidelity for this particular channel-data rate. Although the emphasis today is on fitting audio into small spaces and through thin pipes, the identical notions can be applied to deliver much higher fidelity audio from standard media such as CD.
One point to take home from the study of audio compression is that linear PCM is actually an inefficient way to encode audio. Fig. 1 illustrates the differences between the frequency response of PCM and that of a typical audio signal. Even though most real-world audio signals have markedly less energy in the high frequencies than, say, completely flat white noise, PCM encodes all signals as though they were flat, which wastes a lot of bits.
LOSSY AND LOSSLESS
Any method that reduces the size of audio data can be referred to as data compression, but there is an important distinction to be made between methods that reduce data in such a way that it can be restored bit for bit — so-called lossless compression — and methods, known collectively as lossy compression, that allow the essence of the sound to be restored but don't preserve the precise bits. The latter includes most familiar forms of audio-data compression, including MP3, Dolby's Audio Coding 3 (AC-3), and Sony's Adaptive Transform Audio Coding (ATRAC) compression (used in MiniDisc). Lossy compression schemes are useful because they can reduce data size quite a bit more than lossless techniques. Experience shows that the results can be more than acceptable for most listeners — not perfect, but good enough.
By definition, lossless compression schemes, including Meridian Lossless Packing (MLP) used in DVD-Audio, must restore every bit of the original uncompressed audio data — the audio equivalent of a Zip or StuffIt archive. No one in his or her right mind would compress a file of important text or numeric data in such a manner that the data would come back approximately right. It's either all correct or useless. Lossless audio compression is the same. If a compression method is truly lossless, you can be confident it will not corrupt a single bit.
Questions about fidelity that apply to lossy methods such as MP3 do not apply to lossless compression methods. The trade-off is that lossless techniques cannot achieve nearly as much compression as those that intentionally eliminate information.
Because the word compression is usually applied to lossy and lossless techniques, confusion can occur. In this article, I will use the term data packing to describe lossless audio-data compression schemes; the familiar compression will be used only in reference to lossy methods.
A third class of compression techniques sits midway between data packing and lossy compression. It can be described as nearly lossless because the audio data is not necessarily returned with perfect bit accuracy upon decompression; only slight deviation is tolerated, however, and fidelity remains transparent, even for trained listeners in controlled listening tests. Notable among compression standards that claim that status is Digital Theater Systems' Coherent Acoustics (more commonly called DTS Digital Surround or simply DTS), which is used increasingly in 5.1- and 6.1-surround tracks on DVD. The DTS stream occupies about four times the bandwidth of Dolby AC-3, and many listeners prefer its audio quality.
Data packing, near-lossless compression, and lossy compression share many of the same techniques. The exception is masking-based perceptual coding, which is inherently lossy and used to secure the much higher compression ratios characteristic of lossy schemes such as MP3.
Fig. 2 shows a block diagram of a somewhat generic coder that relies on techniques typical of lossless coding schemes, without the use of perceptual coding. The process illustrated can be made truly lossless, provided that individual elements in the chain are designed appropriately.
FRAMING AND FILTERING
In any audio coder, incoming PCM data is initially stored in a data buffer. The buffer allows for analysis of data across time. Generally speaking, audio coders process data in blocks, or frames, of a few hundred to a few thousand samples. The exact size of each block depends on the particular compression scheme as well as the sampling rate of incoming data, the target bit rate, and the characteristics of the incoming signal.
Larger block sizes permit more efficient coding of steady-state tonal signals but can result in audio artifacts when transients are encountered. Recognizing that different frame sizes are optimal for different types of signals, some compression schemes change the frame size on the fly, using smaller frames when transients are present and reverting to larger frames when the audio is holding more constant. Frame sizes of 256 to 4,096 samples are employed in common coding schemes, with 1,024 something of a standard default.
In most coding schemes, the first major step in processing is the division of each block of incoming data into some number of subbands using a digital filter bank. The number of bands can vary. Some well-known compression schemes use 32 subbands, but others use fewer.
Different coding schemes employ different filter designs. The most common designs are polyphase, Modified Discrete Cosine Transform (MDCT), and hybrid filters, which are a combination of polyphase and MDCT. Filter selection is based on trade-offs of processing efficiency, sharpness (which affects the amount of data reduction that can be achieved), and accuracy of reconstruction. If true lossless coding is required, the filter designs must be accurate.
The effect of this filtering is to divide the samples in the original source frame into smaller numbers of samples in “bins” representing each subband. Thus, if the frame size is 1,024 samples and 32 subbands are used, 32 samples will be used to represent audio in each subband (see Fig. 3). If the filters are designed appropriately, the 32 bins of 32 samples can be used to reconstruct the original block of 1,024 samples with perfect accuracy.
After it has been divided into subbands, audio can be processed with a much higher degree of sophistication — that is, with accurate gain ranging, prediction, and psychoacoustic analysis for each band.
The amount of total energy in each subband varies. Typically, audio energy is centered in a few bands, with other bands showing much less signal. For each band, a scaling factor can be derived. The signal in that band can then be normalized to full scale for processing and then scaled back to the original level in decoding. Over time these block-by-block scale factors describe the amplitude envelope of the signal in all of the subbands, much in the way a classic vocoder extracts the changing levels of different bands in a program signal.
In the next step of processing, the data in each band are analyzed for correlations, which are used to predict upcoming samples. The notion of waveform prediction is that, based on analysis of the current and preceding audio, one can make a reasonable prediction of the shape of the audio wave to come.
With a good prediction filter, the core characteristics of a correlated audio signal can be reduced to a few coefficients. When applied to band-limited signals, such as the output of the subband filter bank, the process can be effective.
Most of the time, the prediction process will not describe the exact waveform that occurs. To benefit from the prediction process, however, it's only necessary to approximate the signal so that the true signal can be described by its difference from the predicted values.
As part of the prediction process, the amount of correlation in the source signal is measured. That becomes important in the next phase of processing.
In the next coding stage, adaptive differential coding, the representation of the waveform in a given band is converted from absolute level to the difference in level between an expected value and the value that actually occurs. That is a process of simple subtraction between the predicted and the actual value (see Fig. 4).
Adaptive differential coding of that kind is only as good as the prediction used. As noted previously, prediction works just with correlated audio signals. Uncorrelated noise and transients do not benefit from the process at all. This is where the measure of correlation comes in. If the degree of correlation in the signal is too low, then a different signal is measured between the current sample and the prior sample. That yields some advantage in coding efficiency, though not nearly as much as adaptive coding with a well-correlated input.
The principle of entropy coding is simple. In any data stream, some values appear more frequently than do others. For every possible value, an alternative value is extracted from a lookup table. Data values (called symbols in this context) that occur frequently are represented by shorter strings of bits, whereas those that happen only occasionally are represented by longer strings. The result is a stream of variable-length words rather than the original fixed-length PCM data. The average word length in this stream will be smaller than that of the original, so there is a net savings. Decoding is performed by the reverse method: looking up shorter codes to extract the exact 16-bit (or 20-bit, 24-bit, or whatever) value that occurred in the original stream.
The key is in selecting a lookup table that works well for the data at hand. Audio data has some characteristics (for example, more samples around zero than at the extremes) that make it amenable to this kind of coding. There are many possible symbol tables, and a “dictionary” of tables is defined by the particular compression scheme. During encoding, a table from the dictionary is selected to provide the largest amount of data reduction.
Lossless and lossy coding methods can employ all of the techniques described in the previous section. The degree of precision and rigor applied may be quite different depending on the goal, but the principles remain the same.
Lossy compression methods also use a variety of techniques collectively known as perceptual coding. Those exploit the demonstrable principle that human beings cannot hear everything in an audio signal. Specifically, there are “masking” phenomena, found to be consistent for virtually all listeners, in which a signal of given frequency, perfectly audible by itself, cannot be heard when another, significantly louder component that is close in frequency is present.
In some descriptions of perceptual coding, the process is described as one of “identifying the components that cannot be heard and removing them,” but that is a bit misleading. What actually occurs in the common processes is a band-specific reduction in bit resolution, based on a calculation of the amount of quantization noise that can be masked by a signal in that band. Bit resolution for a given band can be reduced to as low as 0 bits, effectively removing that band (though generally that occurs only when no signal is present in that band).
MATRIXING AND GAIN
Stereo or multichannel-surround audio inevitably contains a lot of redundancy between channels. By identifying the content in common between channels, a substantial reduction in the amount of information to be coded for each channel can be achieved. Sum-and-difference matrices that are at the head of the audio-processing chain extract redundant information for more efficient coding.
Furthermore, every audio signal has an amplitude envelope. By extracting the envelope of the signal, the audio waveform's level can be normalized before processing. The envelope is preserved with the coded audio data and restored in decoding.
The key to successful lossy compression is in analyzing the incoming signal to determine where resolution can be reduced. Fig. 5 illustrates the essential principles of audio masking. Fig. 6 shows a block diagram for an audio coder that incorporates bit allocation based on psychoacoustic analysis. Generally, that is done using a fast Fourier transform (FFT) that provides an accurate representation of tonal and nontonal energy by band.
The results of the analysis are applied to a psychoacoustic model to determine the amount of masking in each band. In general, the amount of energy in a band defines a spreading function that determines masking. The more energy in a given band, the broader the range of frequencies it will mask. The spreads of the various bands overlap to create a composite masking curve (see Fig. 7).
The psychoacoustic models are based on large amounts of data gathered by researchers over many years. Most of that material is in the public domain, and the models used for all lossy methods draw from essentially the same pool of data. However, the models derived from this data can be more or less elaborate, and that is another area in which the quality of results can be affected by the complexity of the process. In MPEG audio, for example, two models are available. MP3 employs the second, more sophisticated of the two models. The exact ones that are used for proprietary processes such as AC-3 are confidential.
While researching this article, I found the following explanation of audio masking (reproduced here courtesy of Mattnet; www.mattnet.freeserve.co.uk): “Imagine you are in a room with some mice. If the room were completely silent and one of the mice were to fart, you may just about hear it. Now imagine there is a stick of dynamite in the room, and it explodes just as the mouse farts. The chances of you still hearing the fart are practically zero, owing to the fact that it has been drowned out by the … explosion.” That's as good an explanation as any I've heard.
The results of psychoacoustic analysis are applied to select the bit resolution to be used for each subband. By looking at the energy and audibility threshold for each band, the encoder can determine the lowest bit resolution that can be applied in that band. If total energy in a band falls below what is determined to be audible, then that band may be deleted.
Thus, subband coding adds additional content in the form of quantization noise and removes bands that fall below audibility. The combination of those processes results in the overall compression ratio.
The highest performance in lossy compression is realized by combining the available techniques of lossless and lossy methods. Fig. 8 shows a block diagram of a coder using all of the methods discussed so far.
Localization of a signal falls off markedly as frequency increases, and that, too, can be exploited in lossy compression. If you've ever tried to pinpoint the source of a high-pitched whine, you may have an idea how this works — it's hard to tell where the whine is coming from.
By taking advantage of that phenomenon, you can mix the top end of both channels of a stereo pair together and code them as a single signal to be distributed between both channels on decoding. In some cases, the amplitude envelope of each channel is preserved, even though the signal beneath the envelope is the same for both channels.
You might wonder how developers test audio compression schemes. In the case of lossless packing, the testing is relatively straightforward: either the source signal is restored bit for bit or it's not.
For the lossy compression schemes that dominate the field, it's more difficult, especially as consumers gain experience and the stakes for performance are raised. By definition, all schemes increase noise and distortion compared with the original signal, making comparison by conventional audio measurements meaningless. Somehow one has to determine how successfully the changes from the original signal are being hidden from the listener.
In the end, the evaluation of a compression scheme (which typically includes comparing it with different, competing schemes) comes down to an elaborate game of “If it sounds good, do it.” Listening tests are the only method that have been agreed upon as being useful for the evaluation of lossy coding schemes. (See the sidebar “A Rigorous Test” for an account of a series of tests that were conducted by the International Standards Organization.) Many questions remain, however, about the details of the methodology that should be used and about the validity of such tests. (Fig. 9 shows a system that can be used for triple-blind listening tests.)
The fact is that the success or failure of audio-data reduction schemes in the market seems to proceed quite independently of any formal testing. MP3 became a major phenomenon not because it was shown to work well through testing, but because millions of users enjoyed it.
I have mentioned a number of data packing and data compression schemes. Here's a closer look at some of the most common possibilities. This is not an exhaustive list by any means, but these should be of interest to musicians. I've also included notes on their technology and application.
MERIDIAN AUDIO MLP
MLP has the distinction of being the only lossless packing scheme that has qualified as an official standard to date, having been selected by DVD Forum as the standard form of data reduction for use in DVD-Audio. It's no wonder. Meridian has done an excellent job not only in developing the data packing algorithm but also in addressing needs of production and delivery in a real-world medium.
Meridian has been careful in qualifying MLP; the company recognizes that the amount of data reduction that can be achieved without loss will vary with the characteristics of the program. Data reduction ratios are specified as either “typical” or “minimum.” Some variation in those figures is based on resolution, sampling rate, and channel configuration (generally speaking, greater reduction can be accomplished for multichannel surround because of redundancies between channels), but the figures for typical ratios hover in the area of 50 percent (2:1) data reduction. During compression, warnings are generated if the required compression ratio cannot be achieved losslessly, and it is then up to the operator to make any needed changes in the program.
The table, “High-Density Stereo and Multichannel Options,” illustrates the application of MLP compression in DVD-Audio. DVD-A supports many options for the number of audio channels, bit resolution, and sampling rates. Some combinations of those exceed the total data transfer rate available for a standard DVD player, which is 9.8 Mbps. In those cases, MLP lossless coding is used to reduce the data rate to conform to the DVD standard.
MERGING TECHNOLOGIES LRC
A number of lossless audio-data packing schemes have been developed, but only a few have made it to the market. Merging Technologies' Lossless Realtime Coding (LRC) has been fully readied for license to provide compression and decompression programs for Mac, PC, and common digital signal processing (DSP) chips. Merging Technologies is focusing on offering the LRC technology for license to manufacturers of DAWs and other high-end audio systems that can benefit from lossless packing.
Merging Technologies states that LRC can reach a compression ratio of 3:1, depending on input signal. The charts provided in online documentation indicate that most real-world audio signals achieve a lossless ratio in the range of 2:1 to 2.5:1.
Many higher-end integrated audio workstations specify that audio remain uncompressed to maintain optimal audio quality. Although it's true that linear PCM guarantees that what's recorded on disc will be reproduced accurately, it's misleading to claim that the exclusive use of unreduced audio data will always give the best performance or even the best audio quality. Lossless, or “virtually” lossless, data packing offers the opportunity to use higher-resolution, higher-sampling-rate audio in a multitrack hard-disk recorder. If the data packing technology holds up, there is no reason to presume that this approach will be in any way inferior to linear, uncompressed PCM.
So far Roland's VS-series workstations are the only systems in widespread use to take this approach. The high-end VS-2480 and VS-1880 workstations offer linear PCM recording as well as recording modes that employ the company's proprietary Roland Digital Audio Coding (R-DAC) technology. Using R-DAC, as many as 16 channels of audio at 24 bits and a 96 kHz sampling rate can be recorded in real time, with three times the recording time available from linear PCM at the same resolution.
MPEG AUDIO LAYERS
MPEG-1, Audio Layer 3 audio compression, commonly called MP3, is certainly the most famous audio compression scheme in the world, thanks to the phenomenon of Internet downloads and the near-apoplectic legal response from the recording industry. MPEG Layer 1 and Layer 2 are not as well known.
The scheme of audio layers came about when MPEG-1 for video was originally defined. At that time, there was a recognized need for audio coding to meet individual needs and to address the technology available at different points in time. The three layers basically represent a hierarchy of complexity and performance. Layer 1 required the least complexity (and had the lowest latency) but has effectively passed out of use. Layer 2, which is substantially more complex, has seen widespread use in Video CD, DVD, and broadcasting.
Layer 3 (MP3) took a much bigger step. The basic framework of the encoding algorithm for Layers 1 and 2 is preserved, but additional elements are added to allow for compression that is substantially more efficient. Likewise, MP3 takes significantly more digital processing to implement than Layer 2. It emerged in the late 1990s as the preferred method for audio coding of music because it could deliver quality acceptable to a broad range of listeners at considerably lower bit rates.
MPEG-2 AAC AND MPEG-4
Like Theodore Bikel's devil in the film 200 Motels, Advanced Audio Coding (AAC) is known by many names. Originally, it was called MPEG-2 NBC Audio, with “NBC” standing for “nonbackward compatible.” AAC is a collaborative effort aimed at creating a definitive standard for lossy audio coding — one that can deliver a lot better quality for a given bit rate than MP3 and, conversely, can provide acceptable audio performance at rates much lower than are required for the same perceived quality with MPEG-1 Layer 2 or Layer 3. Carefully controlled listening tests have demonstrated that those goals were achieved.
SONY ATRAC AND ATRAC3
Sony got an early start on mass-market application of audio-data compression when it released the MiniDisc format in 1992. As such, Sony was exposed to a fair amount of heat, in that consumer acceptance of audio perceptual coding had not yet been established. Also, the technology of lossy compression had not advanced to anything like its current state of development, particularly in the encoders that were built into early MiniDisc devices.
In the MiniDisc, ATRAC achieves a compression ratio of 5:1 over CD. This translates to about 280 kbps and is much higher than ratios routinely used with MP3, AAC, and others. Sony has since released an updated version of ATRAC, known as ATRAC3, that claims to deliver the same level of audio quality as MiniDisc ATRAC at bit rates of 128 kbps, making it approximately equivalent to AAC.
DOLBY AC-3 (DOLBY DIGITAL)
After MP3, Dolby's AC-3 scheme is probably the best known audio compression scheme, thanks to its position as the de facto standard for DVD production. Whereas AC-3 can be (and is) applied to 2-channel audio, its roots lie in multichannel surround for theatrical playback and HDTV. The predecessors of AC-3 (AC-1 and AC-2) were methods for coding two channels of audio. For HDTV transmissions with surround audio, it was assumed that the source audio would be matrix coded in the manner of Dolby Pro Surround before being digitally compressed. The decoder would then decode two audio channels and dematrix them to yield surround sound with the same kind of performance you had before DVD.
Somewhere along the way, it became clear that audio compression technology offered an opportunity to do something much better by coding all of the channels of a 5.1 field. By taking advantage of interchannel redundancies, the bit rate for complete discrete surround, with total separation and full bandwidth on every channel, would not be a lot more than that needed for stereo. (The original target for 5.1 coding was 320 kbps, corresponding to the amount of bandwidth available within HDTV.)
It was one of those moments of brilliance you will have reason to be thankful for many years to come. One can find fault with the fidelity of 5.1 AC-3, but there is no doubt that the aural impact far exceeds anything previously available; it opened new worlds of pleasure for movie viewers and exciting opportunities for composers and music producers.
DTS COHERENT ACOUSTICS
DTS, with roots in movie-theater sound that started with Jurassic Park in 1993, actually markets two audio-data reduction technologies. The apt-X100 scheme, used exclusively in the company's theatrical systems business, employs a combination of linear prediction and adaptive quantization to deliver data reduction ratios of 4:1 with effectively lossless results, according to the company's literature.
DTS's offering for use in consumer DVD, CD, and Laserdisc is officially called Coherent Acoustics; it is represented on playback devices and discs as DTS Digital Surround. That is a more flexible algorithm than apt-X100 (say that three times quickly), which is designed for scalability for data rates from 32 kbps up and input sources as high as 24-bit resolution at a 96 kHz sampling rate.
Coherent Acoustics uses a combination of technologies; whether it qualifies as a perceptual coding system depends on the data and ratio as compared to the original source. At the bit rates recommended for use in DVD, the company states that perceptual coding techniques are not used, but it doesn't claim complete bit-for-bit restoration of the audio source. In this application, then, Coherent Acoustics qualifies for the select category of virtually lossless data reduction schemes, warranted to be fully transparent to the listener, but not fully lossless at the data level.
Coherent Acoustics is best known for its use in high-end surround for DVD. The company states that the original impetus for developments was a drive to deliver superior sound within the bandwidth of CD and Laserdisc, potentially packing higher-resolution, higher-sampling-rate data within the limits of the established media. The combination of market opportunity in surround sound, united with a relative absence of consumer interest in ultrafidelity stereo, has led the company to emphasize its surround-sound applications.
Gary S. Hall lives and works in Alameda, California. His current interests include effects design for surround audio, DVD technology, meditation, travel, and recording. He's working with collaborators in Brazil, Switzerland, and upstate New York on a surround techno-tribal album for release on DVD.
A RIGOROUS TEST
For a fascinating view of how the process of testing audio-data compression schemes is pursued at the high end, check out www.tnt.uni-hannover.de/project/mpeg/audio/public/w2006.pdf. The PDF is a detailed account of the formal evaluation of MPEG AAC (Advance Audio Coding) versus MPEG-1, Audio Layer 2 and MPEG-1, Audio Layer 3 (MP3) by the International Standards Organization (ISO).
The ISO is not known for doing things halfway, and the process of qualifying AAC as a standard proved no exception. This compression scheme, after all, represents the collaborative effort of nearly everyone in the business to create the be-all and end-all of lossy compression schemes, with a goal of delivering audio from the original at bit rates of 128 kbps.
The tests were conducted by the BBC, NHK, and MIT Media Labs, with participation by a number of other interested ISO members, including Deutsche Telekom, AT&T, and Scientific Atlanta. A number of audio samples were selected for the test, including stereo music clips and individual, “difficult” sources such as castanets, harpsichord, and pitch pipe. These were compressed using the three profiles of AAC coding (Main, Low Complexity, and SST) as well as with MPEG Layer 2 and MPEG Layer 3, at various bit rates. Care was taken to make sure the compressed samples represented the actual compression scheme, with third parties called on to verify that the test samples matched bit for bit the results of compression using standalone software.
For the test sessions, the selected samples were assembled onto DAT tapes following triple stimulus, double-blind, hidden reference methodology. That means that, for each test, three versions of the sample were presented to the listeners. The first was always the original, uncompressed PCM signal. Of the two following signals, one was a compressed version and the other was the original reference again, with no one but the preparers of the tapes (not the folks running the test session) knowing which was which or what compression scheme had been used. The inclusion of the hidden reference made it possible to determine whether the listeners could hear actual differences or not.
A test group of 31 individuals, all of whom were audio professionals, was assembled. Over a period of days, the test group was taken into a carefully prepared listening room and asked to evaluate a series of test samples. In each test, the listeners were asked to identify which signal was the hidden reference and to evaluate the other signal to a single decimal point on a 1-to-5 scale, with 5 representing absolute fidelity, 4 representing perceptible but not annoying, and so forth.
The process yielded thousands of responses, which later were subjected to careful analysis. First, it was necessary to determine whether the respondents had been able to distinguish the compressed signal from the hidden reference — particularly tricky, because at least some settings were expected to yield indistinguishable results. It was determined, however, that the pattern of response would indicate the validity of the tests. The evaluations of quality were then assessed for averages and spreads (range of responses by different individuals) and for distinct signal types and compression schemes.
When the final results (which are included in the ISO report) were in, the ISO concluded that AAC at 128 kbps was not completely transparent for all sources. Interestingly, a majority of respondents felt that Suzanne Vega sounded better after being coded with AAC at 128 kbps. More importantly, the test indicated that AAC at 128 kbps did, in fact, significantly outperform MP3 at the same rate and delivered performance equivalent with MPEG Layer 2 audio at 192 kbps. Those are impressive results that demonstrate that MPEG AAC is the state of the art in data compression for stereo signals.
Quite a few books on data compression are available; most discuss image and video compression, usually with shorter sections on audio. That is more useful than it appears, as there is substantial overlap in the techniques used.
The Data Compression Book, by Mark Nelson and Jean-Luc Gailly (Hungry Minds, Inc., 1995)
Data Compression: The Complete Reference, by David Salomon (Springer Verlag, 2000)
Digital Video and Audio Compression, by Stephen J. Solari (McGraw-Hill Profession Publishing, 1997)
Introduction to Data Compression, by Khalid Sayood (Morgan Kaufmann Publishers, 2000)
The MPEG Handbook: MPEG-1, MPEG-2, MPEG-4, by John Watkinson (Butterworth-Heinemann, 2001)
MPEG Video: Compression Standard, Joan L. Mitchell, editor (Kluwer Academic Publishers, 1996)
High-Density Stereo and Multichannel Options
DVD-Audio supports a broad range of channel configurations, bit depths, and sampling rates. MLP data packing is required for delivery of multichannel surround at higher densities. Even so, the highest sampling rates available in DVD-A (176.4 and 192 kHz) cannot be supported for multiple channels.
|Audio Format ||Data Transfer Rate (Mbps) ||Supported in DVD-A |
|2 channels, 96 kHz, 24 bits ||4.6 ||as uncompressed PCM |
|2 channels, 192 kHz, 24 bits ||9.2 ||as uncompressed PCM |
|6 channels, 48 kHz, 24 bits ||6.9 ||as uncompressed PCM |
|6 channels, 96 kHz, 24 bits ||13.8 ||as MLP |
|6 channels, 192 kHz, 24 bits ||27.6 ||no |
There are a large number of sites and links for audio-data compression. Many of the best are maintained by companies and institutes directly involved in developing and marketing the various competing schemes. The following links take you past the home page to the parts most directly concerned with audio compression technology.
DTS Professional Audio
Merging Technologies, Inc.
www.mpeg.org/MPEG/audio.html and www.mpeg.org/MPEG/mp3.html