Master Class: Bits and Pieces: Composite Vocal Tracks

Publish date:
Social count:
Image placeholder title

Most of the hit records of the past couple of decades owe their winning lead vocals to composite-editing techniques. For those unfamiliar with composite editing, this essentially involves recording multiple takes of a vocal performance and then combining the best parts from each take into one composite, or comp, take. This creates a superior vocal track where every line is the very best that the singer is capable of delivering.

Image placeholder title

FIG. 1: Many times, a vocal comp for a 3-minute song will have scores of edits.

For example, you might choose take 2 of a vocal track as your best overall take. Let's suppose the first and third lines of the first verse were sung great on this take, so you leave them intact. The second and fourth lines sucked, however, so you need to replace them with parts of other takes where those lines were performed better. You copy and paste the second phrase from take 5 and the fourth phrase from take 6 into take 2 at the same points in the song's timeline at which they originally occurred. Now all four lines of the first verse in the edited take 2 rock!

This technique can be used throughout a song to build a vocal comp where every line is the best it can be. But why limit yourself to pasting entire vocal phrases? You can also paste a single word or syllable or even one vowel or consonant. This might seem like overkill at first, but an emotive growl on a vowel sound in an otherwise flawed take can be pasted into a comp take to bring a phrase to life. On a 3-minute song, it's not uncommon for me to paste together more than a hundred pieces of multiple vocal takes to build a vocal comp, sometimes resulting in four or more edits on a single line (see Fig. 1).

Some DAWs offer highly useful features that streamline the process of copying and pasting together a comp track (see the review of MOTU Digital Performer 6.02 in the April 2009 issue, available at This functionality is a godsend for quickly assembling the best parts from multiple takes. The challenge, however, is to make all the transitions from one audio region to the next sound as transparent as possible, without any pops, clicks, abrupt changes in volume, dropped consonants, or other unnatural artifacts ruining the flow.

Image placeholder title

FIG. 2: The leading Soundbite of this butt splice ends with zero amplitude. But because the following Soundbite''s amplitude is not zero at the splice point, an instantaneous jump in level occurs, causing a click.

In this article, I'll show you how to choose the best splice points for joining two audio regions together for a seamless performance. I'll also discuss the art of applying a crossfade to a butt splice (two audio regions assembled together so they are contiguous) to eliminate artifacts at the splice point. The focus here will be on comping lead vocals, but many of the same techniques can also be used for comping background vocals and instrumental tracks.

I'll start with the basics but progress quickly to tips even experienced DAW users should find helpful. I'll use Digital Performer 6 (DP6) to illustrate my points, but most DAWs can be used to execute the same basic techniques.

What's Wrong with My Butt?

Image placeholder title

FIG. 3a: This vocal''s waveform at the splice point is whipsawed from a negative direction in phase to a positive direction at the zero crossover point, causing a click. The steep slopes and high amplitude crests immediately to either side of the splice point make an audible click more likely to occur.

Indiscriminately joining two audio regions (or Soundbites, in DP's parlance) together to form a butt splice can cause a pop or click at the splice point (the common, adjoining edge at the transition point) between them. As a Soundbite's waveform progresses from positive to negative amplitude and vice versa, it passes through a zero-amplitude crossover point where it is — for a tiny fraction of a millisecond — dead quiet. If one or both of two Soundbites are not at their zero crossover points (and silent) where they are joined, an instantaneous level change happens (see Fig. 2). The resulting square wave creates a click or pop.

After you paste two Soundbites together in your comp take, zoom the waveform display down to the sample level. Use DP's Roll tool to drag the splice point to the right or left as needed until you find a spot where the amplitude of both Soundbites is zero.

Image placeholder title

FIG. 3b: Although the phase trend at the splice point reverses from positive to negative, the amplitude crests and slopes of the two waveforms to either side of the splice point are mild enough that an audible click doesn''t occur.

Finding a common zero crossover point may not be enough, however, to avoid creating a pop or click. Often you must choose a point where the phase of both Soundbites is trending in the same direction. For example, if the leading Soundbite is transitioning from positive to negative amplitude at the splice point, the following Soundbite should be as well; in this case, a click or pop will often occur if the following Soundbite immediately trends toward a positive amplitude. A click is most likely to occur when the slope of both waveforms at the splice point is very steep (indicating high-frequency content) and the immediately preceding and following amplitude crests are high (indicating loud volume). Conversely, you can often get away with opposing phase cycles at the splice point when both waveforms are low amplitude and gently sloping (see Figs. 3a and 3b).

Image placeholder title

FIG. 4a: A zero crossover point cannot be found anywhere near the desired splice point for these two Soundbites.

There will be times when you simply can't find an edit point that fits the criteria just mentioned and a crossfade at the splice point won't get rid of the resulting click. In those instances, choose a splice point where one of the Soundbites has zero amplitude. Then edge edit the other Soundbite to its closest zero crossover point where its phase trend will be the same as that for the first Soundbite at its splice point (see Figs. 4a through 4c). Trimming will initially cause a slight gap between the two Soundbites. Select one Soundbite and, in DP6, Ctrl-drag it toward the other to make it snap to it, eliminating the gap. Usually, trimming and snapping a Soundbite thus will offset it from its original articulation by only about a millisecond along the timeline — not enough to affect the vocal's groove on a short phrase.

Image placeholder title

FIG. 4b: Around half a millisecond is trimmed from the start of the following Soundbite so that it begins at a zero crossover point.

Don't be afraid to break the rules if it sounds good. Suppose the singer sang a consonant such as t or k so softly that it becomes masked by accompanying instrumental tracks. A Soundbite pasted immediately before the consonant might cause a click if you're not careful. But if the click is soft enough, it might actually accentuate the soft consonant it abuts in a way that sounds totally natural. Likewise, an extremely mild pop might make an immediately following b or p sound more intelligible. Just be sure to check the result both on headphones and on full-range monitors (or a subwoofer) to get a feel for whether it might sound excessive or artificial on other monitors.

If a consonant still sounds too soft, you can always copy the same consonant from somewhere else in the song where it was sung louder. Then simply paste it over your soft consonant to replace it. Can't find a loud-enough consonant anywhere? Make a time-range selection across the one that's too quiet, press Command + Y to make the selection a separate Soundbite, and then increase the Soundbite's nondestructive Bite Gain setting to make it louder. (The Bite Gain setting is located in DP6's Sound File Information dialog box; open the dialog box from the Studio menu or by pressing Ctrl + Option + Command + A.)

Double Cross

Image placeholder title

FIG. 4c: The following Soundbite is then snapped to the end of the leading Soundbite. Now both Soundbites exhibit zero amplitude and the same phase trend at the splice point.

In most cases, careful placement of a splice point will preclude the need for placing a crossfade across it. The fewer crossfades you have in your Project document, the less CPU drain there will be on your computer and the quicker your document will open.

That said, crossfades are sometimes necessary to eliminate clicks, pops, and other artifacts. Your DAW will likely give you a choice between making an equal-gain or equal-power crossfade at the splice point. I find that equal-gain crossfades typically yield better results when the adjoined Soundbites contain similar material (as is the case when comping vocals). In any event, you'll want to be able to adjust the length of your crossfades to achieve the best results. Open DP's Create Fades dialog box (in the Audio menu) and choose Fade Selected Time Ranges.

Most of the time, a crossfade spanning only 5 or 10 ms of material is all that's needed to clean up a splice point. Particularly stubborn artifacts, however, may require a 30 ms crossfade or longer. Just be aware that a long crossfade will likely create an audible doubling effect across its span because both Soundbites will voice during the crossfade. That can be either distracting or a nice creative touch, so judge the results carefully.

Image placeholder title

FIG. 5: The center handle of a crossfade is dragged to the right here in DP6''s Create Fades dialog box to create an asymmetrical crossfade. This emphasizes the leading Soundbite''s material at the splice point.

If even a long crossfade doesn't clean up the splice point, try skewing its crossover point (that is, the point where both Soundbites are equally faded) so that it's slightly earlier or later in the timeline. In DP's Create Fades dialog box, grab the crossfade's center handle (the crossover point) and drag it to the left or right of center. This creates an asymmetrical crossfade across your time-range selection (see Fig. 5).

To appreciate what this accomplishes, remember that a crossfade includes material in both joined Soundbites beyond the splice point: material after the end of the leading Soundbite and material before the start of the following one are both voiced. (This is true only if each Soundbite's edge was trimmed and there is more material in each parent file beyond the current edge.) Creating an asymmetrical crossfade results in an exponential curve that accentuates one Soundbite with respect to the other for a longer period during the transition (including beyond the splice point) between them.

For example, moving the crossfade's center handle to the right of the splice point makes the material at the end of the leading Soundbite sound more pronounced and softens material at the start of the following Soundbite. Conversely, moving the center handle to the left of the splice point emphasizes the following Soundbite's material while understating the leading Soundbite's content during the transition.

Let's examine a situation in which an asymmetrical crossfade would be useful. Suppose the leading Soundbite ends with a soft consonant most clearly enunciated past the splice point. It's followed by a Soundbite that begins with a hard glottal stop on a vowel. Each Soundbite sounds fine on its own, but when combined, the transition between them sounds unnaturally abrupt. You try moving the splice point later to capture more of the consonant, but that only makes the glottal stop sound harder. Worse, you can't find a later splice point that doesn't cause a loud click.

The solution is to drag the crossfade's center handle to the right of the splice point. This emphasizes the soft consonant by fading it less than the glottal stop at the splice point. It also softens the hard glottal stop by reducing its volume at the splice point more than a symmetrical crossfade would.


A serious blemish at the start or end of an otherwise fantastic Soundbite might prompt you to disqualify it from use in your vocal comp. Don't be so quick to throw it away.

A subwoofer-popping plosive (such as a hard b or p) at the start of a Soundbite can often be tamed by applying a short fade-in there. Make the fade-in long enough to span the high-amplitude transient at the start of the plosive but not so long that the consonant at the start of the lyric becomes so quiet as to be unintelligible. An undesirably hard glottal stop on a vowel sound can also be softened the same way.

The end of a vocal phrase may be contaminated by drum bleed from headphones or the singer tapping their foot to the beat of the music. If a simple edge edit sounds too abrupt — cutting out prominent room tone captured by the omni mic you used, for instance — try applying a short fade-out at the end of the Soundbite to reduce the noise in volume.

Make it a point to supersize your vocal comp by zooming in both vertically and horizontally. You'll be able to see quiet noises such as that caused by an HVAC system, the singer brushing an arm against their shirt, or a neighbor's car door slamming shut in the distance. Trim your Soundbites and fade them in and out as needed to eliminate these distractions. You might not hear them now, buried in a rough mix, but you will after compression and limiting are applied during mixdown and mastering.

I'm Fading Fast

Comping a vocal takes time when many edits are needed. That's partially because the success or failure of any one technique used on a particular edit is hit-and-miss — on a difficult splice, you might have to try a few different things to see what works best. Truth be told, it's not uncommon for me to take several hours comping a single lead-vocal track for a project whose budget allows.

It's the mastery of multiple techniques and the attention to detail, however, that allow you to do a ton of edits on a vocal track and have it sound completely natural. And killer.

Visit EM contributing editor Michael Cooper Every lead vocal there is a comp composed of between 70 and 140 edits.