Digging Into Digital Audio

The same basic principles apply to all forms of digital-audio recording, storage, and playback. This includes samplers, digital multitrack tape decks, DATs, hard-disk recorders, and CDs. If this realm remains foreign to you, read on.
Image placeholder title

Soundis an analog phenomenon. The changing air pressure that pushes andpulls on our eardrums varies smoothly rather than jumping discretelyfrom one pressure to another. Most electrical audio signals are alsoanalog; the voltage in a cable varies smoothly in a way that mimics thechanging air pressure of the sound represented by the signal. In fact,the signal's changing voltage is analogous to the changing pressure ofthe sound it represents, hence the term analog audio.

Image placeholder title

Recently, however, the landscape has been altered somewhat. Audiosignals are now commonly stored and transmitted as digital information.This offers several advantages over analog audio. For example, there isno loss of audio quality as you make copies of the data. In addition,it is much easier to edit and assemble digital audio information.Finally, there is virtually no tape noise when recording digitalaudio.

Fortunately, the same basic principles apply to all forms ofdigital-audio recording, storage, and playback. This includes samplers,digital multitrack tape decks, DATs, hard-disk recorders, and CDs. Ifthis realm remains foreign to you, read on.


Humans use ten digits—0 to 9—to express all numbers; thisis called the decimal number system. The decimal system probably arosebecause we have ten fingers (which are also called digits). To expressnumbers larger than 9, we combine two or more digits. For example, withtwo decimal digits, we can express 100 numbers from 0 to 99. With threedecimal digits, we can express 1,000 numbers from 0 to 999.

Computers use only two digits: 0 and 1. This is called the binarynumber system, and binary digits are called bits (short for BinarydigITS). Like humans, computers combine two or more bits to expresslarger numbers. For example, with two bits, you can express fournumbers: 00, 01, 10, and 11. With three bits, you can express eightnumbers, from 000 to 111.

Are you starting to see a pattern here? The pattern is this:

Number of numbers you can express = 2(number of bits youcombine)

So, if you have eight bits, you can express 28 = 256 numbers; withsixteen bits, you can express 216 = 65,536 numbers.

Computers almost universally combine eight bits into what is calleda byte; a group of four bits is half a byte, which is called a nibble.These days, most computers also work with groups of bits calledwords.


The starting point of most digital-audio systems is an analog audiosignal from a microphone or other analog source. (Some systems cangenerate digital audio from scratch without an analog source, but I'mgoing to put this idea aside for now.) The goal is to convert theanalog audio signal into a series of discrete digital numbers that acomputer can deal with.

A sample-and-hold circuit measures, or samples, the instantaneousvoltage, or amplitude, of an analog audio signal and holds that valueuntil an analog-to-digital converter (ADC) converts it into a binarynumber. The sample-and-hold circuit then reads the next instantaneousamplitude and holds it for the ADC. This occurs many times per secondas the signal's alternating voltage rises and falls. As a result, thesmoothly varying analog waveform is converted into a series of "stairsteps".

In some systems, the lowest possible instantaneous amplitude isrepresented by a string of zeros, and the highest possibleinstantaneous amplitude is represented by a string of ones. In othersystems, a string of 0s represents the middle of the possibleamplitudes. Values with a zero as the first bit represent amplitudesabove the middle (positive), while values with a one as the first bitrepresent amplitudes below the middle (negative). This is calledtwo's-complement representation, which allows for positive and negativenumbers.

Stereo signals are converted separately and then multiplexed, orcombined, into a single stream of binary numbers. The numbersrepresenting the right and left channels are interleaved, oralternated, in the stream.

The most common technique for encoding each instantaneous amplitudeis called pulse-code modulation (PCM). Each bit is a code for anelectrical or optical pulse; 1 = high-level pulse, 0 = low-level pulse.For example, if an instantaneous amplitude is represented by the binarynumber 1101, four pulses are sent: high, high, low, high. The rate atwhich the measurements are taken and the number of bits used torepresent each measurement are the two most fundamental concepts indigital audio.


The rate at which the instantaneous-amplitude measurements are takenis called the sampling rate, and the time between measurements iscalled the sampling period. The more often measurements are taken, thehigher the frequency that can be accurately represented. However, moremeasurements require more storage (which we'll discuss in more detailshortly).

If the frequency of the analog signal is low compared with thesampling rate, you get an accurate representation of the signal. If thefrequency of the signal is over half the sampling rate, though, someweird things start to happen (more in a moment). The frequency thatcorresponds to half the sampling rate is called the Nyquist frequencyafter American engineer Harry Nyquist. For example, if the samplingrate is 48 kHz (48,000 measurements per second), the Nyquist frequencyis 24 kHz.

The Nyquist frequency is the maximum frequency that the system canaccurately represent and reproduce. This is called the audio bandwidthof the system. For example, if the sampling rate is 48 kHz, the systemcan represent and reproduce audio signals at frequencies from 0 to 24kHz. In other words, the audio bandwidth of the system is 24 kHz.

By contrast, the digital bandwidth of the system is the maximumnumber of bits per second it can transmit or receive. For example, ifthe maximum sampling rate is 48 kHz and each instantaneous-amplitudemeasurement is represented with sixteen bits (more in a moment), thedigital bandwidth is 48,000 x 16 = 768,000 bits per second, or 768kbps. In a stereo system, this digital bandwidth would double to 1.536megabits per second (Mbps).

When digitizing a signal whose frequency is greater than the Nyquistfrequency, you run into a problem called aliasing. In this case, themeasurements of instantaneous amplitude don't accurately reflect theshape of the original signal's waveform. The measurements are taken atdisparate points along the waveform. When these measurements arereconstructed into an analog signal, it has a lower frequency than theoriginal. (In fact, several alias signals appear above and below theoriginal frequency.)

As a precaution against aliasing, the input signal is sent throughan antialiasing filter before it reaches the sample-and-hold circuit.This lowpass filter blocks any frequencies that are greater than theNyquist frequency of the system while passing all frequencies below theNyquist limit. The slope of the filter is very steep, which leads manypeople to call it a brickwall filter.

All CDs use one sampling rate—44.1 kHz—which is alsocommon among samplers, DATs, hard-disk recorders, and digitalmultitracks. This rate was adopted as a standard because its Nyquistfrequency is 22.05 kHz, which is just above the top of the humanhearing range. As a result, all frequencies we can hear are accuratelyrepresented. However, there is much debate in the audio industry aboutwhether or not overtones above 20 kHz make an audible contribution tothe entire signal. In fact, some DATs are now available with a samplingrate of 96 kHz to address this issue.

Many professional systems offer a sampling rate of 48 kHz inaddition to 44.1 kHz. Multimedia titles often use lower sampling ratesof 11 kHz or 22 kHz to reduce storage requirements. This yields loweraudio quality, which isn't considered as critical in this applicationbecause most computer audio-playback systems have relatively lowfidelity anyway.

In many samplers, it's possible to use different sampling rates toconserve storage requirements. For example, you might sample the lowestnotes of a bass at 11 kHz; there are probably no overtones above 5.5kHz, so you don't lose anything by sampling these notes at a lowerrate. Higher notes can be sampled at 44.1 kHz and combined with the lownotes to form an entire sampled bass.

In some systems, the input is sampled at a higher rate than will beused to reproduce the signal; this is called oversampling. As you mightimagine, this increases the Nyquist frequency and reduces aliasing.After the signal has been sampled, a digital filter removes anyfrequency components above the final Nyquist frequency, and the data isoutput at the final sampling rate.


The number of bits used to represent each instantaneous measurementis called the resolution or word length. The greater the resolution,the more accurately each measurement is represented. However, the morebits you use, the greater the storage requirements (more in a moment).Until very recently, the most common resolution for digital audio was16 bits. However, many digital audio products use 18 bits, and someprofessional systems use 20 or 24 bits, whereas multimedia titles oftenuse 8 bits to conserve storage.

The resolution determines the number of steps between the lowest andhighest instantaneous amplitude the system can represent. With 16-bitresolution, there are 65,536 steps between the lowest and highestamplitudes. This defines the dynamic range of the system.Theoretically, the dynamic range of a 16-bit system is 98 dB, butvarious factors reduce this figure to about 90 dB for practicalpurposes.

No matter how many bits are used to represent each instantaneousmeasurement, the representation is not always completely accurate. Inmost cases, the actual measurement value must be rounded to the nearestbinary number. This is called quantization, and the difference betweenthe actual measured amplitude and the quantized binary representationis called quantization error.

Quantization error can lead to audible quantization noise, which isparticularly apparent in signals of low amplitude because only a fewbits are used to represent the entire signal. As a result, you shouldtry to keep the input signal's overall amplitude as close as possibleto the maximum level that the system can accommodate. Optimizing thegain structure of your audio system can be a big help in this regard(see "Recording Musician: Gain Stages" in the November 1993 EM.)

However, you must be careful not to exceed the system's maximumsignal level. If the instantaneous amplitude of the input signal risesabove the highest point that can be represented by the binary numbers,the signal will be clipped (i.e., the top of the waveform will bechopped off, forming a horizontal line). This makes a very unpleasantnoise. Unlike analog recorders, the input-signal level must not exceed0 on the VU meter in order to avoid clipping. Some digital recordersactually calibrate the 0 VU point a few dB below the actual clippingpoint so users can exceed this level without clipping as if they wereusing an analog recorder.

The most common solution to quantization noise is called dithering.In this process, a small amount of noise is added to the input signalbefore it is measured and quantized. This randomizes the quantizationerror, reducing its audible effect. For this reason, it is particularlyimportant to apply dithering to minimize audible artifacts that arisewhen the resolution of a digital-audio signal is reduced, which is acommon procedure in multimedia titles.


Once the signal has been digitized into a stream of binary numbers,it is stored in one medium or another. Common media include magnetictape or disk, optical disc, RAM, and ROM. At a sampling rate of 44.1kHz and a resolution of sixteen bits, digital audio data consumes over5 MB per minute for a monaural file or 10 MB per minute for a stereofile. Digital-audio data stored in this manner is referred to as beinglinear.

To reduce storage requirements, you can reduce the sampling rateand/or resolution, but this also reduces audio quality. Another optionis called compression, which is often used in multimedia titles. Inthis process, the digital-audio data is compressed to reduce storagerequirements by as much as 4:1 or 5:1. In other words, a given amountof digital-audio date requires 1/4 or 1/5 as much storage as anequivalent amount of linear data.

There are many types of digital-audio compression, which can bedivided into two broad categories: lossy and lossless. Lossycompression provides the greatest storage reduction, but some of theinformation is lost forever. As a result, lossy compression schemes aredesigned to lose information that in theory represents sound wewouldn't hear anyway due to masking and other psychoacoustic effects.(However, with most currently available compression schemes, you can,in fact, hear the difference.) Lossless compression retains all theinformation in a file, but the storage reduction is not asdramatic.


To play a digital-audio signal, it must be converted back intoanalog form. After some error correction, the digital signal is sent toa digital-to-analog converter (DAC). If it's a stereo signal, it isfirst demultiplexed to separate the right and left channels.

The analog output of the DAC still has a stair-step shape, whichintroduces high-frequency artifacts into the signal. In addition, theprocess of digitization creates images of the original waveform'sharmonic spectrum centered at multiples of the sampling rate. Forexample, if the sampling rate is 44.1 kHz, images of the originalspectrum appear centered at 88.2 kHz, etc. You might think that thereis no need to bother with these images, which lie outside the humanhearing range. However, these frequencies can cause audible problems inother audio components. And if the sampling rate is relatively low(e.g., 11 kHz), the images can be audible.

To solve both problems, another brickwall lowpass filter, called ananti-imaging filter, is traditionally placed after the DAC to removeany sonic components above the Nyquist frequency and smooth out thestair steps. These days, many systems use a digital anti-imaging filterbefore the DAC, which reduces the phase anomalies that are soproblematic with analog brickwall filters.

In many modern systems, a digital filter uses oversampling to createa smoother, more accurate output. In this process, the filterinterpolates between the original sample points.


Although many systems use sixteen bits or more to represent eachinstantaneous measurement, another approach is gaining popularity. Thisapproach is called low-bit conversion because it uses only a few bits,sometimes even a single bit, to represent the audio signal.

How is this possible? Consider the following analogy. Traditionaldigital-audio systems are like a row of sixteen light bulbs, eachcontrolled by its own switch. There are 65,536 possible on/offcombinations, which determine the brightness in the room. Roombrightness is analogous to the instantaneous amplitude of an audiosignal. However, each bulb has a different inherent brightness, whichintroduces error into the system. This is analogous to the errorintroduced by high-bit converters.

You can also control the brightness in the room with a single lightbulb by switching it on and off at a high rate. The brightness isdetermined by how long the light is on relative to how long it is off.This is analogous to a 1-bit converter. When the instantaneousamplitude is high, the converter sends mostly ones; when the amplitudeis low, the converter sends mostly zeros. Low-bit converters areinherently more accurate than high-bit converters, but their samplingrate must be much higher than high-bit designs.

One way to use fewer bits is called differential coding. Thistechnique is based on measuring the difference between oneinstantaneous amplitude and the next rather than the amplitudesthemselves. It generally requires fewer bits to accurately representthe differences, which are smaller than the actual amplitudes. Forexample, delta modulation quantizes the difference (which is oftenrepresented by the Greek letter delta) between consecutiveamplitudes.

A more sophisticated variation is called delta-sigma modulation.(This is sometimes called sigma-delta modulation, although some audioprofessionals make a distinction between these terms, using them todescribe slightly different techniques.) This process takes thedifference (delta) between the current instantaneous amplitude and theintegral of the quantized previous difference. (Integrals aremathematical operations related to sums, and sums are often representedby the Greek letter sigma.) Delta-sigma converters provide excellentsound quality at a lower price, which is why they are used so muchthese days.

Digital-audio systems are difficult to design and build, but thebasic concepts are relatively easy to understand. Once you grasp theseconcepts, you can optimize your use of samplers, DATs, digitalmultitracks, and hard-disk recorders and enjoy high-quality audio forrelatively little monetary investment. In addition, digital productsalways improve their performance while falling in price, so the futurelooks bright for all forms of digital audio.

Scott Wilkinson digs digital audio. Thanks to Ken Pohlmann forhis help with this article.