Prepping A Vocal For The Mix

As far as I’m concerned, the vocal is the most important part of a song. It’s the conversation that forms a bond between performer and listener, and the focus to which other instruments give support.

And that’s why you must handle vocals with kid gloves. Too much pitch correction removes the humanity from a vocal, and getting overly aggressive with composite recording (the art of piecing together a cohesive part from multiple takes, and the subject of a future Vocal Cords) can destroy the continuity that tells a good story. Even too much reverb or EQ can mean more than bad sonic decisions, as these can affect the vocal’s emotional dynamics. But you also want to apply enough processing to make sure you have the finest, cleanest vocal foundation possible — without degrading what makes a vocal really work. And that’s why we’re here.

Vocals are inherently noisy. You have mic preamps, low-level signals, and significant amounts of amplification. Furthermore, you want the vocalist to feel comfortable, and that can lead to problems, as well. For example, I prefer not to sing into a mic on a stand unless I’m playing guitar at the same time. I want to hold the mic, which means mic-handling noise is a possibility. Pop filters are also an issue — as some engineers don’t like to use them — but they may be necessary to cut out low-frequency plosives. In general, I think you’re better off placing fewer restrictions on the vocalist, and having to fix things in the mix, rather than having the vocalist think too hard about, say, mic handling. A great vocal performance with a small pop or tick trumps a boring, but perfect, vocal.

Okay, now let’s prep that vocal for the mix.


The first thing I do with a vocal is turn it into one long track that lasts from the start of the song to the end, then bounce it to disk for bringing into a digital audio editing program. Despite the sophistication of host software, with a few exceptions (Adobe Audition and Samplitude come to mind), we’re not quite at the point where a multitrack host can always replace a solid digital-audio editor.

Once the track is in the editor, the first stop is generally noise reduction. Sound Forge, Adobe Audition, and Wavelab have excellent built-in noise reduction algorithms, but you can also use stand-alone programs such as Diamond Cut 6. Choose a noise reduction algorithm that takes a “noiseprint” of the noise, and then subtracts it from the signal. Using this simply involves finding a portion of the vocal that consists only of hiss, saving that as a reference sample, then instructing the program to subtract anything with the sample’s characteristics from the vocal (Figure 1).
There are two cautions, though. First, make sure you sample the hiss only. You’ll need only a hundred milliseconds or so. Second, don’t apply too much noise reduction. About 6dB to10dB should be enough — for reasons that will become especially obvious in the next section. Otherwise, you may remove parts of the vocal itself, or add artifacts, both of which contribute to artificiality. Removing hiss makes for a much more open vocal sound that also prevents “clouding” the other instruments.


Now that we’ve reduced the overall hiss level, it’s time to delete all the silent sections between vocal passages. If you do this, the voice will mask hiss when it’s present, and when there’s no voice, there will be no hiss at all (also see the Power App Alley in this issue on Sonar 6, which describes how to reclaim disk space when removing silence).

With all programs, you start by defining the region you want to remove. From there, different programs handle creating silence differently. Some will have a “silence” command that reduces the level of the selected region to zero. Others will require you to alter level, like reducing the volume by “-Infinity” (Figure 2). Furthermore, the program may introduce a crossfade between the processed and unprocessed section, thus creating a less abrupt transition. If it doesn’t, you’ll probably need to add a fade-in from the silent section to the next section, and a fade-out when going from the vocal into a silent section.


I feel that breath inhales are a natural part of the vocal process, and it’s a mistake to use hard disk recording to get rid of these entirely. For example, an obvious inhale cues the listener that the subsequent vocal section is going to “take some work.”

That said, applying any compression later on will bring up the levels of any vocal artifacts, possibly to the point of being objectionable. I use one of two processes to reduce the level of artifacts.

The first option is to simply define the region with the artifact, and reduce the gain by 3dB to 6dB (Figure 3). This will be enough to retain the essential character of an artifact, but make it less obvious compared to the vocal.

The second option is to again define the region, but this time, apply a fade-in (Figure 4). This also may provide the benefit of fading up from silence if silence precedes the artifact.
Mouth noises can be problematic, as these are sometimes short, “clicky” transients. In this case, you can sometimes cut just the transient, and paste some of the adjoining signal on top of it (choose an option that mixes the signal with the area you removed; overwriting might produce a discontinuity at the start or end of the pasted region).


A lot of people rely on compression to even out a vocal’s peaks. That certainly has its place, but there’s something you need to do first: Phrase-by-phrase normalization. Unless you have the mic technique of a k.d. lang, the odds are excellent that some phrases will be softer than others. If you apply compression, the lower-level passages might not be affected very much, whereas the high-level ones will sound squashed. It’s better to get the entire vocal to a consistent level first, before applying any compression. This will retain more overall dynamics. If you need to add an element of expressiveness later on (e.g., the song gets softer in a particular place, so you need to make the vocal softer), you can do this with judicious use of automation.

Referring to Figure 5, the upper waveform is the unprocessed vocal, and the lower waveform shows the results of phrase-by-phrase normalization. Note how the level is far more consistent in the lower waveform.

However, be very careful to normalize entire phrases. You don’t want to get so involved in this process that you start normalizing, say, individual words. Within any given phrase there will be a certain internal dynamics, and you definitely want to retain them.


DSP is a beautiful thing. Now our vocal is cleaner, of a more consistent level, and it has any annoying artifacts tamed — all without reducing any natural qualities the vocal may have. At this point, you can start doing more elaborate processes, such as pitch correction (but please, apply it sparingly and rarely!), EQ, dynamics control, and reverb. But, as you add these, you’ll be doing so on a much firmer foundation.