Audio IDOL

The human voice is arguably the most important form of communication in the world. People gravitate to it instinctively, almost involuntarily. Producers
Publish date:
Social count:
The human voice is arguably the most important form of communication in the world. People gravitate to it instinctively, almost involuntarily. Producers

The human voice is arguably the most important form of communication in the world. People gravitate to it instinctively, almost involuntarily. Producers and engineers place an extremely high degree of focus on the vocal in a recording, so this piece will examine three important aspects of seeing a great vocal sound through to the final mix: tracking, editing and tuning. Hopefully, you can learn a thing or two that will help improve that most precious element of your mix.


Have you ever heard the old saying “There's never time to do it right, but there's always time to do it over”? Pithy platitudes aside, you will save a lot of time by getting the vocal right the first time. Although editing and tuning will be part of this discussion, it's a lot easier to nail it down during tracking. There are no perfect vocalists, so you probably won't get the vocal exactly the way you want it during tracking, but I recommend putting a significant effort into it. And one final important note before you begin: Try to record all of the vocals for a song in one session. If not, you'll later have to reproduce the entire setup very precisely to get the same exact sound. If you can't do it all at once, carefully document every little detail so you can reproduce the setup later.

First of all, think about acoustics. Record in a tight and absorptive space, because reverb can be added but not subtracted, and surround your vocalist with absorptive material. Fabric that's heavy and thick works wonderfully, or you can try a closet with lots of hanging clothes. Also, don't expect absorptive materials to eliminate outside noises — they won't. Use a centrally located closet at night or whenever it's quiet. (All the magic happens after midnight, anyway!)

Microphone choice is also very important. Beg, borrow, steal or rent the best mic you can get your hands on, and use a cardioid pattern. Large-diaphragm condensers, with their high definition and resolution, are generally best. Ribbon or dynamic mics can also do the trick. If you're using a tube mic, turn it on and let it warm up for at least an hour before recording. Use the mic's internal highpass filter, removing everything below 75 Hz or so. This can reduce air-conditioning noise, but I recommend turning the A/C off during takes. Place the mic anywhere from six to 12 inches in front of the vocalist, and locate it at about chin height to get chest resonance. A pop filter keeps the vocalist far enough off the mic and minimizes plosive popping.

Now that you have your mic, be sure to use a high-quality preamp. A stand-alone mic pre will sometimes produce better quality than the ones in your mixer. Also, don't overdrive the preamp. Save the distortion for mixdown. Compressing during recording is preferable for two reasons: It gets more apparent level to the recording medium, and listeners have grown accustomed to hearing it. I recommend against ratios of more than about 3:1, but you might go higher with an extremely dynamic vocal. If the vocal ranges from a whisper to a scream, track the different sections separately. Furthermore, don't overdo EQ during recording, and, remember, you're better off to cut rather than to boost. A condenser mic will give you the entire frequency range, and it's probably best to wait until mixdown to do more radical EQ surgery.

Presuming that you're recording to Digidesign Pro Tools, Emagic Logic or some other DAW, analog-to-digital conversion is an issue. Higher resolution means better quality, but it also means using more disk space, RAM and CPU power. Recording at 24-bit resolution is nice, and higher sample rates are great, but 16-bit is usually just fine; anything beyond 96kHz is overkill unless you have a dead-quiet room, an incredible signal chain and a world-class vocalist. Don't worry about filling up all the bits. Get a reasonable level, but remember that you risk digital clipping when you tickle the top.

Monitoring is yet another important factor. The headphones should seal tightly around the ears to reduce bleed. Compromise with the vocalist on loud headphones, and have reverb available if the vocalist wants it. Reverb plug-ins will cause too much latency, so use an outboard unit. Vocalists perform best when it sounds good, so let them dictate the mix.

Next, bounce a rough mix of the music as a guide track. Mute all but the guide and vocal tracks to reduce crashes and other glitches while you record. One school of thought for tracking is to just go for it and punch in to correct problems as you go. The other is to record multiple takes and comp the vocal together later. I prefer a hybrid approach. Set up three tracks, have the vocalist do three takes all the way through the song and then invite him or her in to listen and critique. Select one phrase at a time, and allow the producer and the vocalist to hear all three takes of that phrase. If one is great, keep it. If you have two that are really good, keep both. The rule, however, is that at least one must be eliminated, so literally select the audio and “remove from session.” The magic here is comping on the fly. Leave the keeper takes where they are, and punch in to the space left by the elimination of a bad take. If none of the three takes meets expectations, eliminate all three, send the vocalist back to the booth and punch in.

Allow enough pre- and post-roll time to keep the vocalist in the groove. Also, make sure that your DAW is configured so that the vocalist can sing along with the pre-existing vocal, which gets him or her on pitch and in time. (This is called Auto Input Monitor mode in Pro Tools.) When the punch happens, the vocalist flows right into it. Place both punch-in and punch-out points between words. It's nearly impossible to get a really smooth punch in the middle of a word.

This hybrid tracking technique works quite well. Usually, the vocalist only goes in the booth twice (if vocalists are really good, sometimes only once). You may discover that punching as you go suits you better, but I strongly recommend trying the hybrid technique.


Human vocals, like most other naturally occurring sounds, tend to decay in time. When editing, you don't want to chop off the natural decay (the tail) under normal circumstances. Similarly, you won't want to chop off the beginning of the word or phrase (called the front or the top), either. One other important thing to keep in mind is that all edits should happen as close as possible to the point that the wave passes through zero amplitude (the little line running down the middle of the track). This will make your edits inaudible. This is particularly important if you're removing audio with the intention of joining the two pieces on either side back together. Also, you've hopefully tracked all of the vocals at one time. If not, some tracks may sound drastically different from others, causing hours of futzing with EQ, noise and other issues.

At mixdown, it's desirable to eliminate noise and other unwanted junk between phrases and words. With any DAW, you can eliminate what you don't want and keep what you do. Don't overdo the edit and chop off important material at the beginning or the end. Leave a bit of space (milliseconds will do) at the front and then fade into the top of the word. At the tail, start the edit a bit after the decay reaches dead silence and then put a fade on the very end. Most DAWs have a “strip silence” function to eliminate dead space, and it can be useful. Set the threshold to remove the big chunks of dead air. Then, go back to tighten up the edits and do your fade-ins and fade-outs.

Just because you have the ability to strip the vocal down to the bone doesn't mean that you always should. Some engineers carefully establish perfect level and phase relationships between mics during live tracking, and chopping up the vocal track may destroy an otherwise excellent balance. If you're overdubbing the vocals, however, you will almost certainly want to clean things up. But what of breaths? Some producers insist on removing every breath, lending a slick polish to the final product. It can be argued, however, that it sucks soul and life out of the vocal. The vocal definitely sounds more human with the breaths left in. A big breath, for instance, is a little foreshadowing of something big about to happen. Put some thought into whether the vocal could benefit from some humanness.

Eliminating pops, clicks, lip smacks and other little mouth noises is usually a good idea. If the problem is really bad, retrack. Otherwise, editing is the way to contend with it. Take, for example, the little lip smack that happens when the vocalist opens his or her mouth to begin singing. This will be eliminated when you remove the noise between words and phrases. Luckily, most mouth noises fit this description. Sometimes, you'll find ticks or snaps in the middle of a word. Once again, retrack if it's drastic. To fix it, magnify the audio significantly (at least to millisecond level or maybe even near-sample level). The tick will look like a spike in the middle of an otherwise smooth waveform. Inside your audio editor, use the pencil tool — or whatever equivalent you have — to draw a nice smooth line that approximates what the wave should have done where the spike was.

Time correction is a welcome convenience afforded by DAW recording. Some applications can even do it for you automatically. I recommend Celemony Melodyne. It provides not only astonishing (and very musical) timing correction but also incredibly good pitch correction. With luck, the vocalist will either rush or drag on a consistent basis. If so, simply select the poorly timed part and then play that selection. Listen to how it fits in the groove and then nudge it into position. Nudge resolution in Pro Tools can be in beats, actual time or samples. If 64th-note increments take me too far in either direction, I'll work at the millisecond level; 60,000 divided by the tempo gives the length of a quarter note in milliseconds. Obviously, the length of an eighth note is half of that, a 16th note is a quarter and so on.

Sometimes, a vocalist will drag in one spot and rush in another, and, there, it gets a bit more complicated but not insurmountable. You have to listen very carefully. It can be quite tricky to determine which part is early and which is late. Once you know that, nudging is all that remains. Nudge early sections later and later sections earlier. But what if this causes these sections to overlap? In some cases, it's no big deal. If it is, then time compression and expansion becomes necessary. If a word is held too long, select it and time-compress it. The Audio Suite Time Compression/Expansion plug-in in Pro Tools makes this easy. By enabling Grid mode, you can literally see how long a note should be. You can be mathematical about time compression or expansion, but undo allows trial and error, which invites happy accidents that may yield mathematically imperfect results that are nonetheless pleasing from a groove standpoint. If you compress an isolated syllable, a gap will be created at the end of it, and you will need to move the end of that word earlier to close it. A crossfade may also be necessary to smooth the edit.

With editing, you can do much more than just clean up and correct timing. You can also do dynamics processing and even EQ. Don't bother if you're attempting to achieve a natural-sounding vocal, particularly for acoustic music — you're not going to be happy with the results. If, on the other hand, you want a processed, slick or even nonhuman vocal, get crazy and have fun. You can flange, chorus, phase, distort or put a “telephone”-style EQ on a single phrase or hard compression on a line that's out of control in terms of level. However, leave time-based processing (reverb and delay in particular) to real-time processing during mixdown.


Pitch is generally the most problematic issue with vocals. Much time and effort has gone into the development of manual and automatic ways to fix it. As with timing problems, one of the more difficult issues is listening. Hearing pitch problems is as much art as science. Most people have good relative pitch, but very few have perfect pitch, so defer to the ear of the producer, even though some are seemingly tone deaf and others are excruciatingly meticulous. Some pitch problems are abundantly obvious, and others are more subtle or even tricky auditory illusions.

To begin to combat these issues, be aware of the frame of reference. Make sure that every tonal instrument in the song is tuned to A440, or big problems can arise. I'll presume that this is the case. A good, solid monitoring level will help you hear the vocal pitch relative to the music. Listening to a phrase at a time is fine with a good vocalist, but if not, you may need to isolate individual words, probably even syllables. When working manually, you need to isolate each tiny element that is flat or sharp; then, select and pitch it up or down. Take great care when selecting audio, as you don't want to inadvertently “untune” something that was already correct. Listen carefully!

After isolating the bad note, just pitch it up or down to the correct note. The correction will be different for each element, but your ear will develop with trial and error, and, eventually, you'll be able to nail it pretty quickly — and you can always undo if you under- or overcorrect. If you don't get it right on the first try, I recommend returning to the original bad pitch before proceeding. Don't get bogged down in the following quagmire: “We went up by 35, then down by 10, then back up by 7, so we're up by 32, right?” As you correct each note, simply move on to the next. This process can yield great results, but it can be a bit tedious. I recommend manual correction only if you have just a handful of bad notes or if automated correction is not available.

I've already mentioned Melodyne, which is insanely powerful for both pitch and timing corrections, but I will presume that if you have automatic pitch correction, it is through Antares Auto-Tune. This product has two modes of operation: automatic and graphical. Either way, the first thing to do is to assess the key signature of the song. Be careful, though, because a key change in the song will require you to make an adjustment. You'll also need to know which mode, or scale, the song is in: major, minor or something more exotic. You won't do too many tunes with Pythagorean or Slendro tuning, but be aware of such scales. You can edit your own if necessary, and you can also enter the scale via MIDI. You can also bypass or remove notes from the scale. If the vocalist consistently misses a particular note or two, you can correct only the offensive notes, leaving the correct ones alone. You can also salvage extremely off-pitch notes by “removing” notes from the scale.

Once key and scale are established in Auto-Tune, set the retune and tracking settings. Retune determines how quickly the pitch moves from one note to the next. At zero, pitch changes happen instantaneously, which is good for correcting monophonic instruments but not human vocals, unless you're trying to achieve the Cher effect. At higher values, the shift is slower and more natural. The default setting of 20 should usually do the trick. The Tracking knob determines how fussy Auto-Tune is at tracking pitch. Low settings (more “relaxed”) are necessary because the pitch-tracking algorithm may have difficulty “hearing” the pitch if the recording is noisy or breathy. Higher settings (more “choosy”) are appropriate if the vocal is particularly clear and clean. The default setting of 25 should be fine.

For graphical pitch correction, start in automatic mode, establishing key, scale, retune and tracking settings. Then, switch to graphical mode and select the audio you wish to tune. Click on the Track Pitch button, and Auto-Tune will analyze the pitch information and graphically represent it in red. Click on the Make Auto button, and Auto-Tune will then display a yellow curve indicating the pitch correction dictated by the settings you've selected. Finally, click on the Correct Pitch button to implement the changes indicated by the yellow curve in the graphic display. If the pitch is not corrected the way you want, undo the pitch correction, edit the yellow curve with Auto-Tune's suite of curve-drawing tools or change the settings in automatic mode and try again.

It makes sense that you'll use automatic mode most of the time. As always, set up key and scale first. Once the settings are made, the process indeed becomes automatic. If the song changes key, select the vocal in the modulated section, change the settings and proceed. Similarly, if there are notes that are not being caught and fixed, zoom in, select the offending sections and change the settings to fix the problem. A badly out-of-tune vocal may require working at the phrase or even word level of resolution. Try to correct the entire vocal in one shot, and if it doesn't work, select smaller sections and proceed from there.


Getting a vocal tracked, edited and tuned correctly can be a complex but worthwhile process. The vocal is what sells the song more than 90 percent of the time, so some concentrated effort to make it as good as possible will invariably increase the overall quality of the production by a significant margin. Take some time, get it exactly right and make that vocalist sound great!