Imagine trying to play a duet with another musician standing two and a half football fields away. Because sound travels through air at about 1,000 feet per second, each of you would have to wait approximately three-quarters of a second (750 ms) to hear the other, making collaboration frustratingly difficult. Yet it's not uncommon for computers to impose that amount of delay on audio. For anyone trying to record music on a computer or play software synthesizers, that delay, called latency, can make the process feel as though you are trying to run with a 300-pound linebacker wrapped around your legs.
In this article, I'll explain what causes latency and how to reduce it to a musically acceptable level. Because of the way computers handle audio, latency is often unavoidable. But if you understand why it happens and how to work with it, you'll be able to get the most expressive performance out of your setup.
Illustration: Jack Desrocher
Latency simply means the time it takes a computer-music system to move audio from one place to another. Every time the signal is altered (converted from analog to digital form, for example, or passed through a reverb effect), the computer needs time to do the calculations. In addition, computers transfer audio data in chunks rather than in a continuous stream, which introduces a delay equivalent to the size of the chunk.
This chunking technique is actually used to ensure a continuous flow of audio, which may at first seem counter-intuitive. As BIAS programmer Dug Wright explains, “In the analog world, audio moves through a studio as fast as the electrons can travel, which is basically the speed of light. You can plug in your guitar or keyboard, play a note, and its sound comes out of the speakers at essentially the same instant. In a digital system, there is a clock dividing audio into little slices (called samples) at the sampling rate. Then that little slice of audio is moved around by a network of gates that can open and close only at the frequency of another clock — about 800 MHz with a modern computer. Because the CPU has to do many things, like draw graphics and move the cursor, it divides tasks such as moving audio data into chunks or buffers of samples.”
To see why grouping samples into buffers helps to ensure a smooth flow of audio, imagine giving yourself a shower with a squirt gun. If your finger cramped up or the gun ran dry, the stream would stop, interrupting your shower and leaving you cold. Now imagine using a bucket instead of the gun. By filling the bucket and then punching a hole in the bottom, you'd guarantee yourself an uninterrupted shower. And as long as you could add water to the bucket faster than it drained, your shower could continue indefinitely. The drawback is that you wouldn't be able to begin your shower immediately because it would take time to fill the bucket. Moreover, if you wanted to change the water temperature, you'd have to wait for the most recent squirt of water to work its way down to the bottom of the bucket.
FIG. 1: Optimizing your computer for audio involves finding the best balance between responsiveness (latency) and CPU usage. This control panel from Native Instruments'' Reaktor, a software synthesizer, details some of the trade-offs.
In audio terms, the bucket would be the buffer, and the time it took the new water to descend to the bottom would be the latency. If the buffer “runs dry,” you will likely hear clicks and dropouts. Most professional audio interfaces allow you to set the size of the buffer in software (see Fig. 1); some let you set the number of buffers as well. Although smaller buffers reduce latency, they also require more CPU time. Every time a buffer fills up, it sends an interrupt message to the computer to indicate that the samples are ready to be processed. That forces the computer to process interrupts instead of audio data, which reduces the number (and complexity) of effects and synthesizer voices it can produce.
Play it safe by using a larger buffer, and you'll notice a correspondingly longer wait before hearing the effects of any musical changes you make — twisting an onscreen mixer knob, for example. Balancing the risk of glitches with system responsiveness is the art of optimizing latency; I'll cover that process in a moment.
LATENCY, STEP BY STEP
As mentioned, the total latency in a system is the sum of several smaller latencies. To demonstrate how these delays add up, Wright offers the following map, which traces the audio signal through the computer from input to output:
- The audio interface's analog-to-digital (A/D) converter samples the audio, which usually induces a delay of about 1.5 ms.
- OVERCOMING LATENCYThe interface stores 64 to 4,096 or more samples in a buffer; it then tells the operating system — through an interrupt — that it's ready to hand them off. Consequently, the oldest sample in the buffer is delayed by the duration of the buffer. This delay is equivalent to the number of samples the buffer can hold divided by the sampling rate. For a 1,024-sample buffer at 48 kHz, that's about 21 ms. (Increasing the sampling rate would lower the latency, but there's no free lunch, because a higher rate fills the buffer more quickly, increasing the frequency of interrupts and the strain on the CPU.)
- The system has to stop what it was doing (because it was interrupted) and get the audio data. Depending on the operating system, that takes anywhere from 500 microseconds (0.5 ms) to 250 ms. That delay is called scheduling latency, and while it isn't part of the latency added to the audio, it can force you to use larger buffers in order to process the audio in real time. (The buffer size must be bigger than the scheduling latency.)
- The operating system then delivers the audio data to the music program through an intermediary piece of software called a driver. Now the application can process the data (record it, mix it, apply effects, and so on) and send it back to the audio card through the operating system to be output. This is where processor speed (more specifically, floating operations per second, or FLOPS) comes into play. A faster processor will be able to perform more operations in the allotted time.
- DRIVE TO SUCCEEDDuring the processing of the audio signal, different signal processors can induce delay. This is referred to as algorithmic latency. Certain types of filters can delay the signal by several samples (sometimes up to hundreds of microseconds); dynamics and intonation plug-ins are especially demanding. If an application is compensating for those delays to maintain sample accuracy between channels, then another few hundred microseconds of latency can be added.
- The audio interface usually accepts data only in chunks that are the same size as the input buffers and at the same time that it's ready to hand off another buffer of incoming data. With the numbers we've been using, that means the samples are again delayed by 21 ms.
- The audio card converts the audio back to analog, taking another 1.5 ms.
So to tally up, an audio card running at a 48 kHz sampling rate and set to 1,024 samples per buffer will induce approximately 1.5 ms (A/D) plus 21 ms (input) plus 0.5 ms (algorithmic) plus 21 ms (output) plus 1.5 ms (D/A), or 45.5 ms total latency from analog input to analog output.
FIG. 2: A growing number of computer audio interfaces, such as the Emagic EMI 2/6, offer zero-latency monitoring. By routing incoming audio directly to the interface''s output rather than looping it through the computer, this design allows you to hear yourself with no delay as you record.
Fortunately, musicians don't have to keep track of all these little steps — playing 21.5 ms ahead of the beat so their overdubs will sync with previously recorded tracks, for example. “All modern driver architectures have a mechanism to report the latency to the host application,” explains Emagic's Gerhard Lengeling, “so the application can simply move the recorded data position on the timeline” to compensate during overdubs.
Many newer audio interfaces feature zero-latency monitoring, which routes a copy of the incoming signal directly to the interface's output (see Fig. 2). Because the copy you're monitoring doesn't pass through the computer, you hear it instantly, but you won't be able to hear any software-based effects processing. If you do want to record or monitor with effects, however, you can use the old-school solution: simply route the input signal through an outboard hardware mixer and effects, then monitor the mixer's output rather than the computer's.
Similarly, you can use an audio interface such as the Korg OASYS PCI (see Fig. 3), which has onboard digital signal processing (DSP) chips. This design puts the outboard effects on the audio card itself for instant aural gratification. The ultimate (albeit the most expensive) approach is to use a complete DSP-based recording system such as Digidesign Pro Tools. These systems minimize latency by doing all their processing on custom hardware, using the host computer solely for graphics and disk access. (You still get the 1.5 ms A/D and D/A conversion delays.)
FIG. 3: Audio interfaces with digital signal processing (DSP) chips, such as this Korg OASYS PCI, can often apply effects to incoming audio. That allows you to hear effects while recording—without the latency you would get if the signals were processed on the computer itself.
DRIVE TO SUCCEED
There are ways to improve latency without hauling out the heavy artillery, though. One of the best is to update your drivers, the tiny pieces of code that handle communication between the music software and the audio interface. Under Windows, this generally means substituting ASIO or WDM drivers for the standard, comatose MME drivers. (Note that your interface and software must support the new driver type.) The default Sound Manager drivers on the Mac provide reasonable latency performance (about 11 ms), but Macs can also benefit from an ASIO upgrade. Most audio interfaces support a variety of drivers; the sidebar “Too Little, Too Latent” offers tips on testing them in your setup. For background on ASIO, WDM, and other driver standards, see “Desktop Musician: Musical Protocols,” in the December 2001 issue of EM and also online.
As mentioned, a faster computer can reduce latency by doing an equivalent amount of processing in the space of a smaller buffer. But you can ease the load on your current CPU by turning off extensions and background tasks. Some people even replace graphics cards that handle interrupts poorly, delaying the computer's response to audio interrupts.
Upgrading your operating system may help, too. “Possibly OS X will be the audio platform for its good real-time behavior,” says Adam Castillo of interface manufacturer M-Audio, which has achieved latencies of less than 1.5 ms on the new Mac OS. The company has also had success under Windows 2000 and XP, clocking 1.5 ms latencies using its Delta cards, Cakewalk Sonar, and WDM drivers. A recent study at the Peabody Institute found that the Linux operating system outperformed both OS X and Windows in some configurations. You can read the study at http://gigue.peabody.jhu.edu/~mdboom/latency-icmc2001.pdf.
Emagic's Clint Ward puts it succinctly: “Latency really occurs when someone is trying to use the native audio system on a slow CPU with no RAM and an old audio I/O device with terrible drivers.”
Whoever said “It's never too latent” probably wasn't a musician. But although some amount of latency is endemic to computer music systems, it's not necessarily the show-stopper that it's often made out to be. Every day, live bands crank out killer performances while their members are standing 15 feet or more apart. Although we don't normally think of it in such terms, this separation creates a “latency” of 15 ms, which is ten times the amount easily obtainable with today's sound cards and software. When you consider that early computer musicians had to wait hours to hear a single note, we've got it pretty easy.
David Battinois the editor ofEM's 2002 Desktop Music Production Guide. He would like to thank Adam Castillo of M-Audio, Greg Ondo of Steinberg, Dug Wright of BIAS, and Clint Ward and Gerhard Lengeling of Emagic for their contributions to this article.
TOO LITTLE, TOO LATENT
When you're confronted with arcane control panels for juggling buffer size and CPU overhead, it's easy to forget that the ultimate goal is to make music. While most experts I spoke with described complex scenarios involving oscilloscopes, auxiliary computers, and other lab-coat apparatus, Clint Ward of Emagic shared this uniquely musical way to optimize any new audio device or driver:
- Install the driver and start your test with its default buffer settings.
- Launch a virtual instrument such as the Emagic EXS24 software sampler and load up a drum kit. (A swelling string sound will not work for this test.)
- Play the drum kit in real time and make a judgment on the feel.
- Change the driver's buffer settings to the lowest possible value.
- Play the drum kit again and make a judgment on the feel and the stability of the driver at that setting.
- Record track after track until the driver or the CPU falls down.
Ward remarks, “In this test you will either experience problems immediately (at the lowest latency settings) or you will eventually hit a computational wall. It will shed immense light on the maturity of the driver, the audio device, and [whether] you need a new CPU. If the driver doesn't fall down until you reach the computational roadblock, you have a potential winner. Increase the buffer size until you perceive latency, then ease off one notch. I feel you give up way too much CPU at the lowest setting when a less greedy one will still feel fine.”
As Brian Smithers noted in “Desktop Musician: Musical Protocols,” in the December 2001 issue of EM, another trick is to reduce the size of the buffer while recording soft-synth parts, then restore it to a safe level during mixdown. That technique provides fast response when you need it most — during live performance.