Adventures in Vocal Processing

For as long as people have been singing, many have hoped for some miracle that would make them sound like better singers than they really were. Powerful
Image placeholder title

For as long as people have been singing, many have hoped for some miracle that would make them sound like better singers than they really were. Powerful computers and recent advances in audio software have given new life to such hopes by making it possible to correct and enhance vocal tracks that might otherwise be unsalvageable. In a modern studio, software-based tools extend an audio professional's ability to capture less-than-perfect vocal performances and turn them into first-rate recordings that surpass physical reality.

No matter what type of music you record, at one point or another you'll probably want to record the oldest musical instrument in existence, the human voice. The vocal track is usually the most important component of any song that isn't entirely instrumental. Virtually every studio, large or small, processes vocals to enhance their appeal. By adding reverb, compression, and other traditional forms of sweetening, recordists give vocal performances a professional polish that overcomes lackluster sound. Digital recording, however, has given rise to a new generation of tools that go beyond sweetening.

In this article, I'll survey some applications and plug-ins that are best suited for treating vocals, and I'll investigate the types of processing they provide. These products fall into categories that encompass physical modeling, pitch correction and transposition, time expansion and compression, breath and sibilance control, and simulating multiple voices. I've selected eight products (including bundles) from six developers: Antares Auto-Tune 4 and Avox, Cakewalk V-Vocal, Celemony Melodyne Studio 3.1, Synchro Arts VocALign Project 2.9, TC-Helicon Intonator HS and VoiceModeler, and Waves Vocal Bundle.

Melodyne and VocALign Project are standalone applications. Avox and Vocal Bundle are plug-in suites, but most of their plug-ins are also available separately. Auto-Tune, Intonator HS, and VoiceModeler are individual plug-ins, and V-Vocal is a dedicated vocal processor within Sonar 5 and 6 Producer Edition. All except the TC-Helicon plug-ins run native and require no DSP acceleration, and all except V-Vocal are cross-platform and run under both Mac OS X and Windows XP.

Voices in My Head

Complex audio processing consumes lots of CPU cycles and needs plenty of RAM, especially when using multiple plug-ins. I ran everything except V-Vocal on my desktop computer, a dual-processor 2.3 GHz Power Mac G5 with Mac OS X 10.4.7, 4 GB of internal RAM, and a MOTU PCI-424 card connected to a 2408mk3 audio interface. To run V-Vocal in Sonar 5, I recruited my notebook computer, a Dell Latitude D610 with a 2 GHz Pentium M, 1 GB of RAM, and Windows XP Professional, connected to an M-Audio Ozonic by means of a StarTech FireWire card.

For the plug-ins, my host applications were BIAS Peak Pro 5.2, Digidesign Pro Tools M-Powered 7.1, Steinberg Cubase SX 3.1.1, and MOTU Digital Performer 4.61. Because Melodyne Studio 3.1 also hosts AU plug-ins, I occasionally used it for that purpose as well. Additionally, I borrowed a TC Electronic PowerCore FireWire, which was necessary to run the TC-Helicon plug-ins.

For source material in my explorations, I used recordings of my own voice, audio examples included with the software, and vocal tracks taken from studio recordings. Because of copyright issues, however, I was unable to use any of the studio recordings as Web Clips. Consequently, I also recorded and manipulated samples from Zero-G's virtual instrument Vocal Forge, which pairs 1.25 GB of phrases sung by studio singers with Native Instruments Intakt Instrument.

Antares Auto-Tune 4

When the original Auto-Tune was released in 1997, its impact on pop music was almost immediate. Since then, its influence has grown to such an extent that it has become almost ubiquitous in recording studios, on CDs, and on the radio. Quite simply, Auto-Tune processes a monophonic vocal or instrumental performance and automatically corrects any flat or sharp notes. Rather than rerecording a good take plagued by bad notes, you can process it with Auto-Tune and get perfect intonation. The AU-, DirectX-, MAS-, RTAS-, TDM-, and VST-compatible plug-in has been a boon to marginal singers and a tremendous time-saver for recording engineers and producers everywhere.

Image placeholder title

FIG. 1: Auto-Tune''s Make Auto function gives you the best of both modes, automatically correcting pitch to a predefined scale and displaying a pitch contour that you can edit as needed.

Auto-Tune 4 (Mac/Win; native, $399; TDM, $599) provides two pitch-correction modes, Automatic and Graphical. Automatic mode continuously analyzes an input's pitch and corrects it to match one of 29 preset scales. An onscreen keyboard visually indicates pitches that Auto-Tune detects in real time; you can use the keyboard to select notes to bypass or remove from the scale. In addition, you can use a MIDI instrument to enter scale data or to select notes to be corrected in real time, effectively transposing the source audio to track whatever you play. And although Graphical mode offers numerous parameters for controlling vibrato, Automatic mode's vibrato functions give you greater flexibility.

Graphical mode continuously analyzes an input's pitch too, but rather than correcting it to a predefined scale, it relies on the Pitch Graph display, which plots pitch against time. You begin by clicking on the Track Pitch button and playing the track to detect its pitches. The original pitch contour will display variations in pitch, with grid lines visually referencing fixed pitches. You then create a target pitch contour, either by clicking on the Make Curve button to duplicate the original, by clicking on the Make Auto button to draw a new contour that conforms to the current Automatic mode settings (see Fig. 1), or by drawing new targets from scratch using the curve- and line-drawing tools. You can then edit the target pitch contour using an assortment of cursor tools. Audition the selected audio and click on the Correct Pitch button to finalize your changes.

Image placeholder title

FIG. 2: Scheduled for imminent release, Auto-Tune 5 will feature an improved detection algorithm, an updated user interface, and several additional enhancements.

Either mode lets you specify the sampling rate, input type (low male voice, for example), Retune Speed, and Tracking. The Retune knob controls how quickly pitch correction will be applied to the input; its highest setting produces instantaneous correction and usually results in characteristic artifacts you've probably heard on the radio (see Web Clip 1). The Tracking knob adjusts Auto-Tune's pitch-detection algorithm to compensate for any noise in the signal; its proper setting depends on whether the vocal track is free of extraneous sounds such as breath or background instruments.

Although Auto-Tune 4 was the current version when I wrote this article, a major update should be available soon (see Fig. 2). Auto-Tune 5 will feature a revised user interface, host transport synchronization, full-time correction mode, a Natural Vibrato function, and a dedicated Snap to Note button. Additional enhancements will include a Humanize function that allows different Retune Speeds for short and sustained notes, and a real-time pitch-tracking display.

Antares Avox

Avox (Mac/Win, $599) is a suite of five vocal-oriented plug-ins that are also available separately from Antares Audio Technologies. They support RTAS and VST formats in Windows XP and Mac OS X, and AU on the Mac. Avox specializes in altering a single voice so that you can either change its character or make it sound like two or more voices.

Image placeholder title

FIG. 3: Antares Throat models the human vocal tract. Not surprisingly, it is most convincing when you use it to make subtle rather than drastic changes to vocal characteristics.

Avox's most complex and processor-intensive plug-in is Throat ($249), which Antares calls a physical modeling vocal designer. Throat processes monophonic vocals through an emulation of the human vocal tract and lets you specify a set of modeled vocal characteristics. A graphical Throat Shaping display helps you visualize changes as Throat adjusts the position and width of five numbered points along the vocal tract, beginning with the vocal cords and ending with the lips (see Fig. 3). It also provides points you can click-and-drag in any direction to manually reshape the vocal tract. Below the display are sliders, buttons, and pop-up menus for telling Throat about the source voice and the voice you want to model.

You begin by specifying the source's Vocal Range (Soprano, Alto/Tenor, or Baritone/Bass) and Voice Type (Soft, Medium, Loud, or Intense). Voice Type is expressed in terms of loudness because the amount of pressure applied to the vocal cords affects timbre. The Precision setting (Subtle, Medium, or Extreme) lets you indicate how accurately Throat translates the source voice, which affects the realism of the modeled voice and helps avoid undesirable artifacts; use trial and error to find the best setting. In the Add Breathiness section, Mix and Highpass Frequency sliders let you dial in filtered noise and determine its character. Manipulating the two parameters quite effectively makes a voice sound raspy or turns it into a whisper (see Web Clip 2). In the Model Glottal section, you can use a slider to adjust the waveform's pulse width and a pop-up menu to specify the modeled Voice Type. Another pair of sliders changes the length and width of the entire modeled vocal tract.

Rather than producing radical effects or gender-bending illusions, Throat's 42 presets concentrate on enhancing vocal quality and are designed to be starting points for user settings. Presets include Clarity, Shorter Throat, Larger Mouth, Softer Breathy, Nasalvox, Hoarse, and the like. Because the range of its parameters extends beyond human physiology, though, Throat can produce extreme effects if you desire.

Image placeholder title

FIG. 4: Although Duo''s vocal-modeling parameters are less complex than Throat''s, it can make one voice sound like two. Like all components in the Avox plug-in bundle, Duo is also available separately.

As its name suggests, Duo ($199) turns a monophonic voice into a mono or stereo pair of voices. Unlike autodoubling processors that merely duplicate a vocal track, Duo lets you apply modeling parameters to the duplicated voice. The modeling controls are much more straightforward and easier to understand than Throat's controls. Four sliders affect the model's Vocal Timbre, Vibrato, Pitch Variation, and Timing Variation (see Fig. 4). By simultaneously affecting several modeling parameters, the Vocal Timbre slider makes the modeled voice less similar to the source as you change its value from the center position. Raising the slider lengthens the modeled vocal tract, and lowering the slider shortens it (see Web Clip 3). Additional sliders let you control each voice's output level and panning.

Choir ($199) is a more ambitious vocal multiplier that turns a monophonic voice into a vocal ensemble singing in unison with the source. A pop-up menu lets you select 4, 8, 16, or 32 modeled voices. Instead of accessing parameters that control timbre, however, you can adjust only three variations for the modeled voices: pitch, timing, and vibrato. As you raise the sliders, each voice becomes increasingly different from the others while still retaining the vocal qualities of the source voice (see Web Clip 4). A single Stereo Spread slider lets you widen the stereo field.

The remaining Avox plug-ins are meant to improve the sound of vocals in a mix rather than alter their number or character. Antares calls Punch ($129) a vocal impact enhancer, and you can apply it to either mono or stereo vocals. Punch combines compression and limiting to give a track the power and clarity it needs to cut through a dense mix, though extreme settings can produce distortion effects. The most crucial slider controls Impact, which makes variations in level more equal across the frequency range that vocals occupy. Two other sliders, Gain and Ceiling, allow you to increase the input level and attenuate the output after processing, respectively. Input Level and Output Level meters help to minimize clipping.

Sybil ($99) emulates a traditional studio de-esser, which lessens the problems caused by certain consonants when recording vocals. Sybil uses a highpass filter and a sidechain-controlled compressor to reduce a track's overall level when it detects s, t, sh, th, or ch sounds. Sliders control the compressor's threshold, compression depth, attack time, and release time, as well as the sidechain's highpass frequency. A Gain Reduction meter displays the amount of compression being applied.

Cakewalk V-Vocal

Since Roland launched the VP-9000 in 2000, its groundbreaking VariPhrase technology has been acclaimed and admired for its effective audio-stretching algorithms. When Cakewalk released Sonar 5 last year, it became the first native computer software to incorporate those same algorithms in the form of a processor called V-Vocal. Specifically developed for processing vocal tracks, V-Vocal gives you wide-ranging control over pitch, tempo, loudness, and formant structure. You can use it to correct pitch, adjust phrasing, add or subtract vibrato, create harmonies, and perform other feats of studio magic.

Image placeholder title

FIG. 5: V-Vocal is the first appearance of Roland''s VariPhrase technology in a digital audio sequencer, Cakewalk Sonar 5 Producer Edition. Because it isn''t a plug-in, V-Vocal lets you resize its window using normal Windows techniques.

In Sonar 5 and Sonar 6 Producer Edition (Win, $619), you activate V-Vocal by selecting some audio data in Track View, and then either pulling down the Edit menu or right-clicking and selecting Create V-Vocal Clip. V-Vocal's window will appear containing a variety of controls and a graphical representation of the selected audio (see Fig. 5). Below the display are controls for mode selection, pitch correction, formant shift, and other tasks, with a tool palette on the left and buttons that control transport and other functions across the top.

When you enable Pitch mode, V-Vocal displays a 2-dimensional graph plotting the original variations in pitch as a red squiggle, overlaid by a pitch curve you can edit, shown as a yellow squiggle. You can adjust pitch manually, click on a button for instant pitch correction, or constrain pitch to follow a scale. Each audio event (typically a word, syllable, or legato phrase) has a horizontal line drawn through it called the Center Pitch; dragging it up or down will transpose the entire event's pitch. Double-clicking anywhere on the pitch curve creates a breakpoint called a Node, which you can click-and-drag up or down to transpose pitch at that location. If you use the Arrow tool to select a portion of the clip and then drag the Center Pitch, new Nodes appear and you can shift only the selected portion. You can redraw pitch using the Line or Curve tools and delete Nodes using the Eraser tool. You can also affect vibrato and other variations in pitch, amplitude, and formant content by using the LFO tool; dragging up from the Center Pitch increases the variation from Center Pitch range, and dragging down decreases it.

Clicking on the Time button replaces the Pitch graph with a more traditional waveform display that plots amplitude against time. You can click near the center axis or double-click between events to divide them into regions, with each division indicated by a green line. Dragging the green line to the left or right expands or compresses the duration of the region.

In Formant mode, a red line appears on the waveform's center axis. Dragging the line up or down shifts the entire clip's formant structure (see Web Clip 5). You can select regions and create Nodes as if you were shifting pitch. In a similar fashion, you can also change amplitude in Dynamics mode.

Celemony Melodyne Studio 3.1

Melodyne Studio 3.1 (Mac/Win, $699) is the most comprehensive program for manipulating the pitch and rhythm of vocal tracks. The application lets you transpose and correct pitch; shift formants; alter tempo, rhythm, and duration; automatically create harmony voices; and otherwise edit audio as if it were MIDI data. Melodyne comes in three versions to suit your needs and budget; the top-of-the-line Studio 3 version works with polyphonic as well as mono files, making it an indispensable studio tool.

Most of the action occurs in Melodyne's Editor and the Arrange window. The Editor, which opens when you load an audio file, furnishes all the time-stretching, pitch-shifting, and related functions. The Arrange window resembles a traditional multitrack digital audio sequencer, and in fact, Melodyne can function as a rather basic audio sequencer. Double-clicking on any track in the Arrange window opens the Editor for that track. For the purposes of this article, then, the Editor is the more interesting window.

Image placeholder title

FIG. 6: Celemony Melodyne displays audio events as Blobs on a grid and lets you manipulate them as if they were MIDI events.

In the Editor, individual audio events are shown as Blobs on a grid that has a note ruler on the left (see Fig. 6). Each Blob's alignment with the note ruler indicates its pitch center. Its shape indicates its amplitude envelope, and its length indicates its duration. If you click on a Blob and drag it up or down, you transpose its pitch by semitones. If you also hold down the Alt (or Option) key, you can transpose it by cents, which is necessary for manual pitch correction. A scrub function allows you to hear the changing pitch as you drag the Blob. If you select Correct Pitch from the Edit menu, Melodyne automatically repairs any pitch errors by aligning pitches with any preset or user-defined scale.

You can also select more than one Blob at the same time and transpose them as a group. That technique is useful for generating harmony parts; just select the notes you want to harmonize and drag them up or down while pressing Shift + Alt. If you like, Melodyne can automatically introduce slight random pitch and timing variations, which helps to prevent comb filter effects and make harmonies sound more natural. Unless you want parallel harmony, you then select and transpose individual notes and groups of notes to adjust their intervals. If you first enable the Scale Snap function, however, the harmonies conform to whatever key you specify.

If Always Show Pitch Curve is enabled in the View menu, you'll see pitch curves superimposed on the Blobs. You can zoom vertically or horizontally to get a better look at a selected audio event or group of events. Zooming is especially handy when you're dealing with a large file or an entire song.

By default, Melodyne preserves an event's formants and amplitude so that its timbre and loudness don't change when you transpose its pitch. If you prefer, however, you can transpose formants, change amplitude, and modify other parameters just as easily as pitch, either by selecting the appropriate command from the Edit menu, by selecting one of several tools from a palette, or by right-clicking and selecting a tool from a contextual menu.

Melodyne offers numerous techniques for changing duration without affecting pitch or formant structure. If you enable Autostretch in the Transport Bar and change the entire file's tempo, all durations will be scaled accordingly. You can import tempo from a MIDI file, halve or double tempo from the Edit menu, and quantize an entire file. You can move a Blob's start or end point, which changes its duration, by clicking-and-dragging its left or right edge. If the audio is contiguous and one event leads immediately to another, manually changing duration affects the duration of events before or after the edited Blob. Changing the duration of a single event that has a rest before or after, however, has no effect on the duration of other events.

One of the most exciting features of Melodyne Studio 3 and later is the ability to work with polyphonic audio files, even an entire mix. When you load a mono or stereo audio file containing vocal harmony, Melodyne displays it as a series of Blobs lined up as if they were all one pitch. Transposing the pitch of any Blob keeps all harmony parts intact; when you pitch-shift an ensemble singing a chord, Melodyne transposes the entire chord rather than the individual voices (see Web Clip 6). Transposing polyphonic material is useful for correcting the pitch or changing the key or tempo of an entire song, or for introducing a tempo or key change at some point during the song.

Image placeholder title

FIG. 7: The first time you open VocALign Project, it will appear as a small window. Change its size, and it will retain that appearance the next time you open it.

Synchro Arts VocALign Project 2.9

Just as some software automatically corrects tuning, VocALign Project (Mac/Win, $375) automatically corrects timing discrepancies between two tracks. It works by performing a spectral analysis of both tracks and then applying variable time compression and expansion to make one line up with the other. Syncing tracks after they're recorded saves time by allowing a singer to concentrate on a good performance during an overdub, rather than on trying to duplicate a previous performance's timing.

VocALign matches the energy pattern variations of one track, called the Dub, to those of another track, called the Guide. You begin by dragging the Guide file into the upper display and the Dub file into the lower one (see Fig. 7). Because VocALign works with audio files no longer than two minutes, you're often better off working with small clips rather than song-length tracks. The Play menu lets you audition either file or both simultaneously. Clicking on the Align button superimposes an outline of the realigned Dub's pattern over the Guide's pattern. You can then audition the two synchronized tracks or the realigned Dub alone. Six presets let you choose how tightly the Dub will be aligned to the Guide. When you're happy with the results, click on the Edit button to save the realigned Dub as a new audio file (see Web Clip 7). VocALign automatically names it and places it in a folder of your choosing.

Thanks to cooperation between Synchro Arts and MOTU, VocALign Project also works from within Digital Performer (DP). First select an audio track, and then open the VocALign submenu in DP's Audio menu to specify it as the Guide. Select another track as the Dub, and then realign the Dub by choosing Align and Spot Audio from the same submenu. Two additional versions of VocALign are available as plug-ins for Pro Tools users, the AudioSuite plug-in VocALign Project for Pro Tools (Mac/Win, $375) and the RTAS plug-in VocALign Pro 4 (Mac/Win, $629 download, $699 boxed).

TC-Helicon Intonator HS

To use Intonator HS or VoiceModeler, you'll need a TC Electronic PowerCore, a hardware-based DSP accelerator that's available as an expansion card or as an external processor. Intonator HS (Mac/Win, $249) is a plug-in that specializes in pitch-correcting vocal tracks. Borrowing algorithms from TC-Helicon's respected line of voice-modeling hardware products, Intonator HS can shift pitch a maximum of six semitones up or down so that it conforms to a preset scale or to your real-time MIDI input. It uses a Hybrid Shifting algorithm (hence the HS in the name) to effectively retain a natural vocal sound even when transposing as much as half an octave.

Image placeholder title

FIG. 8: TC-Helicon Intonator HS transposes pitch so it matches whatever scale you select. Presets include chromatic, major, minor harmonic, Mixolydian, Hawaiian, Javanese Pelog, and more than 40 others.

Intonator HS's pitch meter indicates the input signal in red and how sharp or flat it is by its deviation from the center (see Fig. 8). When correction is enabled, it indicates the amount of correction in blue. You can select optional meter views, such as only the output pitch or the amount of correction. The keyboard has 12 buttons for selecting the key signature, the notes of a custom scale, or a single note in Manual mode. Other controls let you vary the amount of correction, control the speed at which correction is applied, and remove rumble and hum with a low-cut filter. A unique manual pitch-bend wheel lets you vary pitch in real time, but you can't assign your MIDI controller's pitch bender to control it, nor can you change the wheel's 12-semitone range.

The Scale/Mode pop-up menu determines the pitches that the corrected notes will adhere to; you can choose from 47 preset scales and specify 3 user-defined scales. The Custom Scale setting allows you to select which of 12 pitches to include. In Manual mode, you can select a pitch to be corrected by clicking on the keyboard, which is most useful when only a single pitch is causing problems.

Intonator HS's straightforward user interface makes it easy to quickly see and control what's going on. Probably because it's optimized specifically for vocals and has a limited range of pitch-shift, the plug-in does an outstanding job of minimizing audible artifacts and glitches (see Web Clip 8).

TC-Helicon VoiceModeler

Another PowerCore plug-in from TC-Helicon, VoiceModeler (Mac/Win, $249), also duplicates specific capabilities of TC-Helicon's voice-modeling hardware. VoiceModeler can alter voices either subtly or dramatically, making a male voice sound female, for example, or a thin voice sound throaty. The singer's vocal dynamics can control parameters such as breath and growl, allowing you to change a performance's expressive qualities at will. You can dial in a variety of vocal personalities and build an entire choral ensemble from a single voice. VoiceModeler lets you transform a voice's timbral qualities in much the same way you use a synth plug-in to manipulate instrumental sounds.

The best way to discover VoiceModeler's capabilities is to explore its 16 presets. You can easily modify their settings and save your edits as new presets, or you can create your own presets from scratch. VoiceModeler's most essential settings are in the Effect section, which has a Bypass button, a Style menu, and sliders that control depth for each of six Effect parameters: Resonance, Spectral, Breath, Growl, Inflection, and Vibrato.

Image placeholder title

FIG. 9: VoiceModeler Styles include not only physiology-oriented presets such as Narrow Neck, Nosy Vox, and Wide Mouth, but also character-oriented presets like Teen Pop, Like a Child, and Purple Dinosaur.

Resonance controls harmonic content by emphasizing positioning within the vocal tract; some Styles exhibit a deep chest resonance, for instance, and others resonate in the throat or the sinuses. The Spectral parameter is intended to emulate a singer's natural tone control. You can modulate the Spectral and Resonance parameters using VoiceModeler's Modulation section, which supplies independent depth knobs for both destinations and an LFO that can sync to tempo.

Breath presets range from relatively subtle Styles such as Natural, Soft Air, and Medium Rough to the more extreme Dark Whisper, Phlegmy, and Tracheotomy. Growl lets you impart a voice with graininess and grit. The Inflection control is multifaceted, offering gender-bending effects, randomization, and various types of scooping (see Web Clip 9). Vibrato adds periodic variations in pitch and amplitude and furnishes presets for different musical genres.

A graphical display takes up about a third of VoiceModeler's control panel. It provides a continuous visual representation of each parameter's effect on the source, allowing you to quickly grasp how each of the six Effect parameters contributes to what you hear (see Fig. 9).

Waves Vocal Bundle

Waves Vocal Bundle (Mac/Win, $1,000) is a suite that comprises five plug-ins: Tune, DeBreath, Doubler, Renaissance Channel, and Renaissance DeEsser. All but one are native and TDM compatible, and Tune is native only. Three of the plug-ins are available separately, but Renaissance Channel and DeEsser are available only in bundles. And if you happen to own one of several other Waves bundles, you may be entitled to download Tune LT, a lite version of Tune, at no additional charge.

Treading territory similar to Auto-Tune's and V-Vocal's, Tune ($600) corrects pitch errors and lets you graphically manipulate the pitch of monophonic instruments and vocals on mono or stereo tracks. Tune uses ReWire to synchronize with your digital audio sequencer, ensuring that its display always matches what you hear during playback, allowing you to listen to all your tracks simultaneously, and letting you control transport functions from within Tune.

Image placeholder title

FIG. 10: Waves Tune is a multiformat plug-in that can scan and manipulate pitch and tempo for tracks as long as ten minutes. Pitch changes appear as a series of blocks and curves.

Tune's piano-roll Edit window, a grid that plots pitch against time, dominates its graphical user interface. After you've set parameters such as your track's vocal range (Bass through Soprano) and indicated its root and scale type, simply play the track, and Tune will detect and transpose every sharp or flat pitch to the nearest correct pitch and display notes as a series of Segments (see Fig. 10). The original pitch curves will appear as orange squiggles, with the corrected pitch curves appearing as green squiggles. Speed, Note Transition, and Ratio knobs let you tighten or loosen the correction. Although formant correction is enabled by default, you can disable it if you'd prefer an unnatural sound.

After scanning, you can use the Note tool to select any Segments you want to manipulate further. Additional tools let you split and join Segments, move around in the Edit window, zoom in or out, and redraw any portion of the pitch curves. Controls in Tune's Segmentation section let you specify conditions for segmenting correction curves into notes. If you select a Segment and then click on the Vibrato button, Tune will highlight any part of the selection it perceives as vibrato. You can then apply any changes you make using controls in the Vibrato section — even adding vibrato that sounds natural or synthetic (see Web Clip 10).

DeBreath ($350) automatically and selectively removes or attenuates the sound of inhaling and exhaling from monophonic vocal tracks. Unlike more traditional techniques that rely on noise gating, DeBreath is based on an algorithm that detects breaths by comparing them to a library of templates and then separates them from the rest of the signal.

Although DeBreath's defaults are all you need under most circumstances, several controls let you customize its settings. A Breath Graph tracks the similarity between the audio track and the library's templates; anything above its threshold is considered a breath. The Energy Graph displays the signal's total energy; anything above its threshold is not considered a breath, even if the Breath Graph sees it as one. Sliders let you change either threshold to fine-tune the breath-detecting process. You can specify how quickly a breath fades out from the voice path and how quickly it fades back in again. You also control the amount of gain reduction applied to breaths, ensuring a natural sound that doesn't leave gaps in the vocals. You can choose to monitor either the voice or the breath individually.

Renaissance DeEsser
Like other de-esser plug-ins, Renaissance DeEsser is a compressor that uses sidechain filtering to attenuate sibilance — s, t, ch, sh, and th sounds. Its crossover, like those in other Waves compressors, compensates for phase modulation that would otherwise color the sound. Renaissance DeEsser's threshold setting dynamically adapts to the input signal, also contributing to more natural results. A graph helps you visualize your settings, with colored lines representing the crossover's active and passive ranges, the gain's attenuation range, and frequency-dependent attenuation.

You can choose to monitor either the audio or the sidechain. After you've aurally identified the problem frequency, simply adjust the threshold to attenuate it. Use the Range slider to set the maximum gain reduction, which determines how much de-essing is applied. Additional controls let you specify highpass or bandpass filtering for the sidechain, the filter's cutoff or center frequency, and a compression mode. Renaissance DeEsser comes with several presets appropriate for male or female voices or for a full mix.

Renaissance Channel
Renaissance Channel furnishes 4-band EQ, compression, and other channel-strip functions. Although it is part of Waves Vocal Bundle, Renaissance Channel has no features that make it vocal specific. Consequently, there is no need to discuss it in any detail as part of this article.

Despite its name, Doubler ($200) offers more than simple voice doubling; it adds as many as four voices to vocal tracks. Doubler is actually six plug-ins, comprising 2- and 4-voice mono, stereo, and mono-to-stereo versions. Every version supplies a control strip for each of the doubled voices. You can specify each voice's gain, delay, feedback, and tuning parameters, as well as enable or disable voices. Shift any of the voices an octave lower and specify the depth and rate of LFO modulation. Stereo versions also let you govern the panning of each voice, including the original.

Image placeholder title

FIG. 11: Doubler can add two, three, or four voices to a source voice, each with its own delay, detuning, pan, and additional user parameters.

Three displays give you visual feedback and allow you to change parameters graphically (see Fig. 11). One shows the relative gain and stereo positioning of all voices. Another lets you individually change each voice's delay time and detuning, with each voice represented as a colored ball surrounded by moving lines that represent modulation. In the EQ display, you can apply high- and low-shelf filtering to all the doubled voices and graphically change the equalization curve.

Doubler is quite effective at thickening vocal tracks and giving them a nice studio sheen (see Web Clip 11). It's also capable of LFO-controlled panning, pitch-bending, and delay effects ranging from flange to echo. Waves furnishes 19 Doubler presets for a variety of applications and effects.

Turn Geese into Songbirds

As you can see, vocal-processing software can accomplish many transformations that were previously impossible. Only a few years ago, you couldn't change a recording's pitch without affecting its duration or change its duration without affecting its pitch. How well vocal-processing software will perform, though, depends a lot on your expectations. It's still very difficult to begin with a recording of someone who can't sing and turn it into a passable performance. With sufficient effort, you can seemingly cure a bad sense of pitch and a worse sense of rhythm. You can modify a singer's vocal timbre, enhance his or her breathing technique, and turn one singer into as many as you need. You can even add richness and tone to a voice that has none; just don't expect it to sound real.

When the original performance is lacking, you often have to push software to extremes. When that happens, audible artifacts introduce an unavoidable artificial quality to the sound. The problem must be in the nature of digital audio, because whenever I exceeded limitations that were instantly obvious, all the software I tried exhibited a tendency toward making human voices sound synthetic. Of course, a voice that sounds like a machine is desirable for some types of music, and nothing makes a voice stand out like destroying its organic nature.

To effectively process vocals and retain their natural qualities using any of the software surveyed here, the trick is to immediately recognize when something begins to sound artificial and pull back a bit. Admittedly, that can be difficult when you're so wrapped up in an editing session that you lose perspective. If you're correcting pitch, try lowering the speed setting. If you're transposing a harmony part, use the smallest interval you can get away with, and accept that an octave is simply too far. If you're compressing or expanding time, never go beyond halving or doubling tempo, and avoid going that far whenever you can. The difference in one person's voice and another's may be subtler than you imagine. It also helps to bury vaguely synthetic-sounding voices deep in the mix or smear them with reverb. Believe it or not, such techniques can work very well and still sound good.

Don't forget that singers rely on audio professionals to make them sound good, and sounding good is more important than a flawless recording. Achieving natural results depends on using your ears and accepting any limitations your ears recognize. Unless you or your client are willing to accept vocal tracks that sound unnatural, you will often encounter instances when it's preferable to record another take rather than waste time trying to process tracks into submission. The key to successful vocal processing, then, is to know when enough is enough.

Since 1985, Associate Editor Geary Yelton has written hundreds of reviews and feature stories for EM.

Antares Audio Technologies



Synchro Arts



Zero-G/EastWest (distributor)