Illustration: Dmitry Panich
Even as consumers continue to flock to the low-res world of MP3, RealAudio, and Windows Media Audio, the recording industry is moving steadily in the opposite direction, toward higher bit depths and higher sampling rates. Despite that irony, virtually nobody on the production side of the equation doubts that our efforts to improve audio quality are worthwhile.
There are, however, certain voices questioning whether higher bit depths and higher sampling rates alone are the best path to sonic nirvana. Perhaps a fundamental shift in our approach to digitizing and delivering audio is in order.
Sony and Philips have made just such a shift and are promoting their Super Audio Compact Disc (SACD) as an alternative to traditional digital audio media such as CD and the newer DVD-Audio discs. SACD takes quite a different approach to representing audio in the digital domain, and advocates say it combines the best of what we think of as “analog” and “digital” characteristics.
FORGET WHAT YOU KNOW
The principles of Pulse Code Modulation (PCM) audio encoding are so ingrained in our thinking about digital audio technology that it may not have occurred to many people that there are other ways to represent audio with ones and zeros. In the same way that movie film simulates motion by capturing a series of still images and then playing them back in rapid succession, CD-quality PCM measures (captures) the amplitude of a waveform 44,100 times per second and stores each measurement as a 16-bit value. On playback these binary numbers are used to modulate an output voltage into a convincing representation of the original waveform.
FIG. 1: In the 1-bit world of DSD, a string of ones represents high positive amplitude and a string of zeros represents high negative amplitude. For this reason the technique is also known as Pulse Density Modulation.
SACD uses a different digital technique that Sony calls Direct Stream Digital (DSD). In DSD, the amplitude of the waveform is represented by the relative density of the ones and zeros within the bitstream; for that reason the technique is also known as Pulse Density Modulation (PDM). A full positive waveform results in all ones, whereas a full negative waveform results in all zeros (see Fig. 1). Silence would yield a string of alternating ones and zeros. (Alternating ones and zeros reflect the “neutral” density of silence.) Sony claims a resulting dynamic range of 120 dB for DSD.
In the simplest terms, DSD compares the energy of a waveform at each sample point to the accumulated energy of previous sample periods. If the energy is higher, a value of 1 is recorded; if the energy is lower, a value of 0 is recorded. Because the bitstream represents the change (delta) in energy compared to the sum (sigma) of the previous energies, another name for this technique is Sigma-Delta Modulation (SDM).
DSD uses a sampling rate of 2.8224 MHz, 64 times the rate CD-Audio uses. As you might expect, this has a positive effect on frequency response, which is said to extend to 100 kHz. One of the important limitations of PCM is described by the Nyquist theorem, which states that the highest frequency you can capture with any given sampling rate is one-half that sampling rate. As a result, CDs by definition can contain no frequencies higher than 22.05 kHz, and 96 kHz PCM audio must filter anything over 48 kHz.
If you're asking, “Why would I want to record frequencies no one can hear?” you're not alone. One reason is that frequencies we can't hear as pure tones may nevertheless audibly influence the color of complex tones. In particular, the character of transients may be influenced by components whose frequencies are above the official range of our hearing.
Another point in favor of high sampling rates is that timing differences smaller than 1/44,100 of a second may affect our ability to localize sounds. That suggests that DSD should be capable of superior imaging in stereo and surround applications, and naturally Sony and Philips claim that that is the case.
One more justification for DSD is that the steep filters required for PCM to conform to the Nyquist theorem can themselves degrade the sound due to their sharp cutoff. In fact, it has become common for PCM analog-to-digital (A/D) converters to start with a 1-bit conversion stage and then apply a digital decimation filter to convert the bitstream to PCM format. DSD essentially bypasses the decimation filter and stores the 1-bit data directly.
Similarly, a PCM digital-to-analog (D/A) converter typically uses oversampling and interpolation to return the bitstream to a 1-bit signal before output. Put simply, oversampling creates numerous new samples between the original samples, and interpolation makes an educated guess at the proper values for these in-between samples. You may have seen the term “1-bit/64x oversampling DAC” on CD players — this is what that means.
Interpolation and oversampling are unnecessary with DSD because it is already in the 1-bit form. If the terms “decimation” and “educated guess” make you suspicious about the accuracy of PCM A/D and D/A conversion, you have great company among DSD advocates, who claim that those steps degrade the audio quality. DSD neatly avoids decimation and interpolation by staying in the 1-bit form from beginning to end.
Physically, SACD is part of the DVD family. (If Sony and Philips had had their way, it would have been the basis of the DVD-Audio specification, but the other members of the DVD Forum opted to stay with PCM.) It therefore has a single-layer capacity of 4.7 billion bytes and can hold almost twice as much with the addition of a second layer. SACD currently doesn't take advantage of DVD's double-sided possibilities.
When the DSD audio is encoded with Philips' lossless Direct Stream Transfer (DST) codec, a single layer has enough room for two 74-minute versions of the audio, one in stereo DSD and the other in 6-channel surround DSD. Like other lossless audio codecs, DST typically achieves a data reduction of about 50 percent.
Available nonaudio enhancements include text or lyrics, still graphics, and video. However, audio is clearly SACD's raison d'être, so don't expect to be watching The Matrix on SACD any time soon.
One of the most intriguing possibilities of SACD, though, is a low-resolution application of the second data layer. A hybrid CD/SACD disc can accommodate a high-density (HD) layer of DSD audio and a Red Book layer that's readable by current CD players (see Fig. 2). An “old-fashioned” CD player's laser sees through the HD layer's semireflective coating to the reflective Red Book layer and never knows the disc is anything other than a simple CD-DA. With its shorter wavelength, however, the laser of an SACD player is able to focus on the HD layer and take full advantage of its additional capacity. It just doesn't get any more backward compatible than that.
FIG. 2: A hybrid SACD features a Red Book layer and a high-density DSD layer. To the CD player''s laser, the semireflective coating of the DSD layer is invisible, so it reads the Red Book data without noticing the disc''s high-resolution properties.
DSD IN THE PROJECT STUDIO
So how long before we're all burning SACDs in our own studios? My trusty Magic 8 Ball says, “Outlook is murky.” There are, as I see it, three basic obstacles between the typical project studio and cost-effective SACD production: conversion, processing, and production. In time, all three may fall away nicely.
For starters, your current A/D/A converters are useless for SACD. Even though they probably use 1-bit processes as discussed earlier, they don't have a switch to turn off the decimation and interpolation filters. But then again, you were just looking for an excuse to buy new gear, weren't you?
A handful of DSD recorders and converters are available, ranging in price from $4,000 to about $30,000. One such option is the Pyramix DAW from Merging Technologies (see Fig. 3). Its current DSD option supports up to eight channels of 1-bit recording, editing, mixing, and playback. That happens to equal the track capacity of Sony's own Sonoma system, which is not commercially available. Until there are more DSD recorders with significantly higher track counts, don't expect to see too much popular music going the DSD route.
Once you have your bitstream on disc or tape, you may want to edit, process, mix, and master it. The math involved in processing DSD, however, is fundamentally different from the math used in the well-developed world of PCM processing. In fact, Pyramix converts the bitstream to high-resolution PCM for dynamics, EQ, and reverb processing. Other systems use decimation to PCM for metering and waveform display.
So is processing DSD's Achilles' heel? Not necessarily. After all, it's not uncommon for engineers to record to digital and then mix through an analog console, or to insert analog processors within an otherwise all-digital signal path. If we need to use PCM or analog processors within a DSD project, it doesn't entirely negate the benefits of the technology. It does, however, undermine the claims of the format's most over-the-top advocates, who have been proclaiming the imminent — indeed, overdue — demise of PCM.
FIG. 3: Pyramix, from Merging Technologies, is the first moderately priced DAW to feature DSD support. It currently supports up to eight channels of recording, mixing, and editing. Notice the sampling rate of 2.8224 MHz shown at the very bottom center of the screen.
These days it's expected that any studio, no matter how modest, will be capable of burning a reference CD-R of a session or mix, enabling the client to audition it on any home, car, portable, or personal CD player. That is currently not possible with SACD, and the format may never reach that level of ubiquity.
Although it's possible that SACD authoring software will eventually take advantage of recordable DVD drives, at this point you must use digital data tapes such as DLT or AIT to deliver assets to the replication facility. That represents a backward step of about a decade even for commercial studios — back to the days when you didn't really know how a project had turned out until FedEx delivered a reference disc from the manufacturer for your approval.
Even if DVD-R is leveraged for burning one-off SACDs, the format's layers of copy protection may still get in the way. SACD content is protected by an impressive combination of techniques, from visible watermarks that recordable drives can't produce to a scrambled lead-in that confounds CD-ROM drives to encryption of the disc's data. The most troublesome element, though, is an invisible watermark called the Pit Signal Processing Physical Disc Mark (PSP-PDM).
The PSP-PDM must be detected by an SACD player to initiate playback, and it also contains part of the decryption key. Since it can't be created by a recordable DVD drive, a reference disc will not be playable in a normal SACD player. Philips and Sony claim to be committed to keeping copy protection from getting in the way of recording and authoring, but it appears that the convenience we enjoy in creating CDs (to say nothing of short-run production of CDs) will not be matched in SACD production.
AND THE WINNER IS
There's no doubt that with SACD Sony and Philips are aiming squarely at the same niche as DVD-Audio. It would be a classic format war except that so far most of the world has failed to take notice. A glance at the shelves of any electronics superstore demonstrates that consumers are still too satisfied with CDs, too enthusiastic about MP3s, and too comfortable with DVD-Video to turn their attention to either audiophile format.
Let's look at the scorecard anyway. SACD offers producers some heavy-duty copy protection and offers consumers backward compatibility with existing CD players. It also requires studio owners, many of whom are still paying off their upgrades to 24/96 PCM, to invest in new gear.
DVD-Audio is not compatible with existing CD players, but it is compatible with existing PCM recorders and processors. Affordable software for burning reference or short-run DVD-Audio titles on DVD-R is already available.
Both of the formats offer creative possibilities such as surround sound, graphics, text, lyrics, and video. Neither is compatible with existing DVD-Video players. Both offer sound quality that surpasses existing formats, but which sounds better? With all due respect to each camp, I have to conclude that the jury's out, and it probably will be for some time. Rereleased analog recordings account for much of the material that is being released on both formats, so there are a lot of variables other than the inherent differences between PCM and DSD. Furthermore, the use of PCM processing in the DSD production chain or of analog processing in either format plus the various techniques used to extrapolate surround channels from stereo masters muddy the waters sufficiently to keep me undecided for the time being.
The deciding factor may turn out to be what consumers find on the store shelves, in which case SACD has a decided advantage. Sony has one of the world's largest catalogs of recordings, and they've been aggressively releasing them in SACD format. Although the few hundred SACD titles available outnumber the DVD-Audio catalog, they are still short of critical mass in the minds of consumers. In the end, regardless of who wins the format war, when John and Jane Musiclover find a reason to get excited about buying recordings, we're all in a position to benefit.
Brian Smithersplays, teaches, records, writes, and occasionally relaxes in sunny central Florida.