What's in a Word?

Image placeholder title
Image placeholder title

FIG. 1: When any digital recording is made, word size will limit the precision of the recording. When only two bits are used (top), a sine wave appears as a square wave when played back. Using three bits (middle) improves the precision, and using four bits (bottom) improves it even more.

Before digital signal processing (DSP) became the norm in music production, adding delay effects or compression meant running the audio through a hardware device that inevitably added noise. Things are different with digital audio, because plug-ins crunch tidy streams of numbers. Noise problems remain; they're in a different form, however, because digital audio processing cannot be performed with perfect accuracy. This article looks at why that is, and discusses what can be done to minimize the damage.

The Digital Ruler

In the PCM (pulse code modulation) audio found on CDs and DVDs, the precision of the measurement system used for sample values is always tricky. Imagine trying to measure the length of a coastline (a mathematical conundrum explored by fractal pioneer Benoit Mandelbrot). Measurements vary, because their accuracy depends on the size of the ruler being used. The smaller the ruler, the more precisely it accounts for tiny curves and inlets. Because there is no ruler small enough to capture every detail with perfect precision, there is always some estimation involved. In audio, the subtle changes in a signal's amplitude are no less intricate than the details of a coastline.

Digital audio systems are only so precise, because they use a finite number of measuring increments (see Fig. 1). Like any type of computer data, sample values are stored as binary numbers, and the number of bits used to represent those numbers determines how precise they are. With more bits, smaller, more accurate increments between extreme high and low amplitude levels can be stored. Every instantaneous amplitude level is given the value of the nearest measuring increment — a process called quantization. (The term is from quantum physics, in which electrons orbit an atom's nucleus at fixed distances. An electron can orbit at distance A or distance B, but never anywhere between the two.) Working with a higher bit system is like working with a ruler that measures in 1/32-inch increments instead of ⅛-inch increments.

But no matter how many bits are used, sample measurements can never be fully accurate. Because there will always be differences between the actual infinitely variable amplitude level of a signal and the nearest available measuring increment, some degree of distortion, or quantization noise, is inevitable. The problem is compounded at low amplitude levels because the full range of sample values doesn't get used. That makes the error a greater percentage of the signal (see Fig. 2).

There are many names for the number of bits in a system, including quantization level, word size, and bit depth. CD audio uses word sizes of 16 bits (2 bytes), and high-end DVD audio uses 24 bits (3 bytes). Some argue that the superiority of DVD audio over CD audio is because of DVD's improved bit depth.

Oh, How They Multiply!

DSP operations perform arithmetic (usually multiplication) on samples. Some operations take averages from groups of samples, while others multiply all samples by a single value. Multiplication creates longer word sizes. Even a simple act — such as moving a volume fader — can add bits because of the multiplications used. That principle is as true in the decimal system as it is in the binary system. A simple multiplication problem such as 15 × 15 = 225 starts with two-digit numbers (15) but results in a three-digit number (225). If we're restricted to working with only the two leftmost digits, we'd have to chop off the ones column. That would leave us with the problem of whether 225 should be rounded to 220 or 230. Therein lies the quandary of digital audio: what is the best way to approximate sample values when you have limited precision available?

Most processors work with extralong word sizes to give DSP operations some breathing room. But at some point, you'll need to cut down the word sizes to 16 bits if you're mixing for CD. Say you're working in a system that offers 32-bit processing. You compress, add reverb, and then adjust the overall amplitude. Now it's time to prepare your final 16-bit mix, meaning it's time to reduce word sizes by one-half. Using decimal numbers as an example, imagine all your samples during processing are integers between 0 and 1,000; for the final version, however, you have to lose the ones column, leaving values only in increments of 10 (10, 20, 30, … 980, 990, 1,000). Truncating the samples rounds everything to the nearest increment of 10. That is a crude approach, because nuances in things such as stereo spatialization and reverb require very fine variations in the sample values. To get around that, most programs apply redithering (or dithering down) when word sizes are reduced.

Image placeholder title

FIG. 2: The signal at full amplitude uses the entire range of sample values. The low-level signal, while equally complex, can use only the middle four sample values. Thus, a low-level signal can''t be represented as accurately as a high-level signal.

Redithering is similar to the dithering process performed during recording, except that it's done at the back end (see “Square One: Dithering Heights” in the December 1996 issue of EM). Before the lower-valued bits get lopped off, noise (a random value within the range of the bits to be removed) is added to each outgoing sample. If we need to remove the ones column, adding a random number between one and nine to each sample leaves to chance whether we round up or round down to the nearest increment of ten. The result maintains traces of the activity between increments. The final output has greatly improved dynamic range, albeit with some degree of noise.

Proprietary redithering algorithms abound, from Waves IDR to Sony's Super Bit Mapping. (Sony also offers an alternative to PCM audio with its SACD format. See “Optical Wars: DVD vs. SACD” in the November 2004 issue of EM.) Visit http://audio.rightmark.org/lukin/dither/dither.htm for a comparison of those and other approaches.

Time to Get in Shape?

Adding noise is an imperfect solution. But its ill effects can be minimized through noise shaping, which is an advanced form of redithering that filters the quantization-error distortion, pushing it out of the audible range (and making it much less bothersome). The best noise shapers attempt to spread the error noise just below the hearing threshold by taking equal-loudness curves into consideration.

Ideally, you want to stay in the high-bit-depth arena during all preliminary processing and storage. For example, if you do your processing in a 48-bit environment, but between sessions you save your work to 16-bit DAT or as 16-bit AIFF or WAV files, you have to shorten the word lengths and redither every time you store your work. That essentially smudges the work of your session. Noise shaping is helpful, but you can never be sure where in the spectrum the error noise will wind up. If you try to work with files that previously had noise shaping applied to them, you might get some unpleasant surprises due to difference frequencies (also called combination tones). Consequently, noise shaping is something that you want to use only as the final stage of production.

For example, maybe you want to crossfade between two files that have been noise shaped. Individually, each file sounds great. What you may not realize is that one file had its quantization error pushed to, say, 18 kHz, while the other had its distortion pushed to 18.1 kHz. Nothing seems amiss until you try to crossfade the two, and the multiplications involved create sonic grit due a difference frequency centered at 100 Hz. Be sure to keep track of what files have had noise shaping applied, and consider making a backup of everything before you apply it.

The rule of thumb for audio word sizes is “bigger is better.” Record at the highest bit depth that you can, process in big words, and mix to the highest quantization level available.

Mark Ballora teaches music technology at Penn State University.