Recording Dialog for the Digital Arts

Recording dialog for multimedia, video games, documentaries, and local filmmakers is a terrific way to get some paying work into your project studio. You don't need large amounts of great gear, just a few choice pieces and a decent-sounding recording space. EM delves into the how-tos of capturing the spoken word.
Project studios have cropped up like weeds in the past decade, and it can be a challenge to find paying gigs to keep them going. Recording dialog is a terrific way to get some work into your project studio. Many industries have a need for human speech recorded well for corporate presentations, video games, documentaries, radio, audio books, Web sites, and independent film, to name only a few applications. You don't need tons of great audio gear — just a few choice pieces and a decent-sounding recording space.

The primary function of dialog is to tell the story. If the audience can't understand what's being said because of bad recording or is unmoved by a poor performance, the enjoyment of the story is greatly diminished. Ideally, narration goes unnoticed by the listeners; they become so captivated by the meaning of the words and the emotional energy with which they are delivered that they become oblivious to the artifice of recording and simply become absorbed in the story.

The engineer or project-studio owner wears many hats when recording voice-over, also known as VO. Your primary responsibility will be to do a good job of recording dialog with the equipment you have available. Depending on the circumstances, you might also need to line up the talent or give them creative direction to elicit the best performances. You will have to edit and organize the best takes, delivering them to the client in their final processed form. And most important, you will want to put on your most professional demeanor to keep the client, the VO artists, and the producer cool and focused during what might be long, arduous hours.

When you think of digital arts, multimedia might be the first word that comes to mind. Now that most post-production is done on a computer, however, it's all digital. Thus, digital arts comprise a much broader assortment of audio tasks than they used to.


You don't need a lot of gear to record human speech well. A small arsenal of carefully chosen, high-quality hardware and software will greatly improve your end product.

The acoustic space

For almost all recording, the most important equipment is a space that sounds good; recording dialog is no different. For VO applications, you generally want to minimize room reverberation and create a dead, direct, intimate sound. You can best accomplish this by building a vocal booth. You can build a temporary booth by using gobos to partition off a portion of your recording area. The floor should be carpeted and the walls covered with a combination of absorbers and diffusers to create a dead, clear sound with minimal resonances.

Inside the vocal booth, place a high stool, a music stand, a cup holder for water, and a mic stand with a pop filter (see Fig. 1). I put carpet or a towel on the music stand to deaden the metallic reflections. A reading lamp attached to the top of the mic stand will help your talent to see the script. You'll also need a pair of headphones connected to your talkback system so you can communicate freely with the VO talent.

If you're recording ADR (Automated Dialog Replacement, the process of rerecording an actor's voice against a previously filmed performance) or any other dialog in which the talent needs to time his or her reading to visual elements, you will need a video feed to the recording area. I use an LCD flat-panel computer monitor with an NTSC-to-SVGA converter box, with an S-Video cable running to the monitor from the analog output of my computer's digital-video-capture card. The LCD monitor is much quieter than a standard CRT video monitor, and it takes up a lot less room.


Proper microphone choice is a critical aspect of recording VO. You generally want to achieve a rich, creamy vocal sound, but your choice of microphone is tempered somewhat by the dialog application. Fortunately, most mics you would buy for recording dialog work just as well for general musical purposes.

For general-purpose dialog recording, start with a large-diaphragm condenser microphone such as a Neumann TLM 103 or U 87 or an AKG C 414 B. Those mics record plenty of detail across the audible frequency range, so they do well at capturing the human voice's wide dynamic range. You can try some budget large-diaphragm condensers, too, but be aware that many of those mics have a hyped high end that increases sibilance and are missing some of the midrange that accounts for the human voice's warmth.

For a radio-type sound, with a less-pronounced high end and big, beefy low end, look to large-diaphragm dynamic mics. The standard mic for this application is the Electro-Voice RE20, which sounds just great on voice. For an old-time, golden-age-of-radio sound, experiment with ribbon mics such as the Audio Engineering Associates R84. That model is based on the RCA ribbon mics that dominated broadcast for 30 years (see Fig. 2).

Finally, if you are recording voice for ADR, in which you need to match the sound of production dialog, you should use the same microphone that was used on the set. Your best bet along those lines is the Sennheiser MKH 416 shotgun mic. Unfortunately, the MKH 416 is not as useful for general music recording as the others, and it is rather pricey. I use mine for field and Foley recording as well as dialog and have never regretted the purchase.

Be sure to use a pop filter in front of all the mics I've mentioned except the RE20. It will tame plosives and keep diaphragms clean and free of debris. A pop filter is particularly critical on ribbon mics because their delicate insides can be ruined by a heavy burst of air.

Mic preamps

A clean, colorless, high-quality mic preamp is the right choice for VO recording. Nine times out of ten, I reach for my Millennia HV-3D, which I can recommend as a quiet, high-gain, solid-state mic pre. If I were intentionally going for an old-time radio type of effect, I might use a tube mic pre in conjunction with a ribbon mic. In general, though, I want the detail and quiet amplification that a good solid-state mic pre can provide.

A high-quality channel-strip unit could serve well for dialog recording, too. Look for units that are as transparent as possible. One decent choice is the Langevin Dual Vocal Combo, which offers two channels of quiet preamplification, simple tone-control EQ, and smooth optical compression.


The human voice is a highly dynamic instrument, and compression is routinely used to get dialog to sit properly in a mix. The amount of compression you should use will vary depending on the application, but dialog that must be intelligible over loud background sources, such as in radio ads and video games, tends to get squashed pretty hard. I would never use heavy compression while recording, preferring to adjust it on the way out instead. I do use a Universal Audio 1176 frequently for light peak limiting while recording, however.

EQ and de-essing

I like to think that the best EQ is proper microphone choice and placement for a given voice. Sometimes, though, a bit of corrective equalization is necessary to enhance intelligibility or to create a more pleasing sound. Be careful not to overdo it; we humans are quite sensitive to voice, and extreme processing of the spoken word will just end up sounding wrong. Typical EQ applications for VO include a highpass filter at 60 Hz to reduce rumble, adding a gentle lift in the 200-to-500 Hz range to increase warmth, or using a high shelf at 6 or 8 kHz to add a bit of sparkle and intelligibility. If I am doing just a bit of gentle corrective EQ and feel confident about it, I will apply the EQ as I am tracking. If my needs are more complex, as when I'm matching previously recorded voice tracks, I will wait until the mixdown stage to apply EQ.

A de-esser is a tool that lives in every VO engineer's arsenal (see Fig. 3). It does exactly what it is named for: it lowers the sibilant ess sounds in speech, which can often sound sizzly and unpleasant. More specifically, a de-esser is a frequency-dependent compressor that lowers the gain of the input signal when high frequencies, such as those generated by s and t sounds, are present above the signal's threshold amplitude. De-essers come in hardware and plug-in flavors, and both work well. By the way, you can turn any hardware compressor with a sidechain input into a de-esser. Simply route an equalized version of the voice that emphasizes the center frequency you most want to remove into the sidechain input, while passing the unfiltered voice into the audio input.


Just about any computer-based workstation can be used for voice-over recording. A simple 2-track package is usually fine, so long as it supports robust editing. I would steer clear of tape-based systems, simply because editing is such an important part of the process. But Nuendo, Sonar, Vegas, Pro Tools, Audition, and other packages will do the job for you. A laptop computer with a 2-channel USB input device could make a fine portable system for recording and editing VOs on location.


The client

The client is, of course, the person who is paying for all the fun, and ultimately the person you need to please the most. The client may also be the producer or director, in which case he or she will often take creative control during the recording process. When working with the client, ask lots of questions to find out in advance what is wanted. Ask what types of sound the client has liked in the past, and whether he or she has examples of similar material.

You need to hammer out exactly what your final deliverables are. Those would typically be short individual sound files for interactive work, or longer edited performances for projects such as documentaries or books on tape. Establish what the final delivery format should be; CD-R is most common, but DAT and multitrack digital tape are also frequently used. Determine whether you are working on a time-and-materials basis, or if you are supplying an all-in bid. If the latter, be sure to include your time for editing, processing, bouncing, and transfer of materials.

The producer or director

Many projects will have a producer or director, whose job is to supervise and creatively control the performance process. That person will usually be involved in casting (finding and hiring the right voice for the project), script development, and directing the VO talent's actual read.

The talent

Finding the appropriate voice for the job is often the most difficult part of the dialog-recording process. The budget and circumstances of the project often dictate the casting process. More often than not, a director or client will have a talent already lined up before contacting you to record a session. Sometimes, however, you are tasked with finding the VO talent yourself.

The first step is to determine whether you want to use union or nonunion talent (see Fig. 4). Most professional voice-over talents are card-carrying members of SAG (the Screen Actor's Guild) or AFTRA (the American Federation of Television and Radio Actors). Such actors cost a bit more than nonunion talent and have to be sponsored by a union signator in order to work a particular job. Typically, the producer is the union signator, which means that the producer agrees to use only union voice talent for the project. Strict rules must be followed concerning breaks, number of different voices a talent can perform in a given session, and so on.

So why consider using union instead of nonunion talent? Because they are often more experienced, which can net you better results in less time. In addition, many projects, and some recording studios, require union talent. Darragh O'Farrell, director of the voice department at LucasArts Entertainment Company, has directed countless dialog-recording sessions, working primarily in video games and animated television. According to O'Farrell, “In general, union talent is more polished, more prepared to walk into the studio and perform. Things like mic technique are already there. There's never the need to go through a teaching process during the session.”

Nonunion talent can be a viable option for many reasons: your budget may not allow for union talent, you may not be in a locale with a strong union presence, or you may not want to hassle with the paperwork. O'Farrell has occasionally brought nonunion “diamonds in the rough” into the union to make use of their specific talents. To be sure, many nonunion talents can do your job justice, but your casting search is also likely to come up with candidates that are more unqualified or inexperienced. People with less experience will make more mistakes and have a narrower expressive range than the real hotshots; listen to their demo tapes and screen them carefully before moving forward.

If you're looking for the right union talent, your best bet is to contact a local casting or talent agency. Tell them your requirements and they can narrow the field, leading you to a number of qualified candidates. If looking for nonunion talent, you can contact theater groups or post on computer bulletin boards in the film and radio jobs area. Alternatively, if an acting or voice-over school is in your area, contact someone there. Schools are always happy to find paying work for their students. In any case, insist that the prospective talent send you a demo tape or point you to a URL with samples of their voice. If the prospect hasn't gotten far enough to create a demo tape yet, move on to someone who has.


The script

Scripts for voice-over sessions often follow the standard format of film or television scripts. Sometimes, though, they are simply a series of paragraphs of text. Either way, it's best to get a copy of the script to look over before the recording session whenever possible; familiarizing yourself with the material could make the session run more smoothly. Scripts usually have wide margins to allow you to make notes and write down the numbers of the best takes. Voice directors scribble liberally on the scripts, underlining words or phrases they want the talent to punch up or emphasize. I usually do that with a red pencil.

Correctly formatting a script in advance of the session can save a great deal of time during recording and editing. If recording several individual lines for interactive use, writing the actual file names beside the scripted lines is a good idea. Making your script as cinematic as possible will give the actors the best insight into the environment in which their words will be heard; knowing the context can help elicit a better performance.

The session

The recording day arrives. You have already set up your studio before the talent, client, and director show up. You have attached and tested the most likely microphone you will use, booted the DAW and prepared your recording template, and placed the script on the music stand. I always leave a glass or bottle of cold water with fresh squeezed lemon in the recording area for the talent, too. Why lemon? I have found that it helps reduce mouth noise, such as the little pops and snaps that occur from saliva moving about. All of those clicks and pops will have to be edited out, so making sure the talent's mouth is always wet will save much tedium later.

After everyone arrives, I work on getting a sound, adjusting levels, testing microphones, and tweaking EQ until everyone is happy. I like to record with a pretty healthy level, but always leave a decent amount of headroom so that the talent can emote without undue fear of clipping.

O'Farrell likes to put the talent at ease before the red light turns on. “The first trick to getting a good performance is letting the talent feel comfortable,” he says. “If I haven't worked with a particular actor before, I will usually schedule 45 minutes at the beginning of the session to simply hang out and chat. Once a good connection is established, everything becomes a lot easier.”

Once recording begins, on-the-fly organization becomes crucial. I keep one eye on the levels and one eye on the script, following along as the talent reads. I always get the talent to slate, or call out, the take number of the line or paragraph he or she is reading before reading it (for example, “Roosevelt signed the treaty, take 6,” or “Paragraph 23, take 4”). Then, when everyone is happy with the read, I write the number of the selected take in the margin of the script next to the text and circle it. That makes editing a far easier and less ambiguous process later on.

Recording VO has an established studio etiquette and chain of command, which is necessary to keep the production running smoothly. It is important to know your function within the operation, which determines when you should open your mouth and when you should keep it shut. In general, the director is in charge of making the creative decisions, and the engineer is responsible for recording the session. Typically, the engineer would only discuss a performance if there were a technical problem with the recording. Many directors get annoyed at engineers piping up with their opinion about the aesthetics of a particular read; they feel it is an unwelcome distraction to the production process.

Some directors do welcome feedback from the engineer, though. O'Farrell is among them: “I feel that the more heads put together, the better. I'm open to listening to feedback from the engineer. I've always tried to foster team spirit within my sessions, because other people might notice problems while I'm thinking about some other aspect of the script.” Unless you have worked with the director and talent before, though, it is safest to err on the side of caution and offer your opinion only if asked.


After you complete the recording session, it's time to shape the raw takes into finished materials. Exactly what form they will take depends upon the nature of the project. I start by moving through the material in a rough first pass, keeping the takes that I circled on the script and discarding the rest of the material (after making a backup, of course).

During this process, you may have to create a composite performance from multiple takes. Whenever possible, it is usually preferable to keep entire takes rather than pasting separate performances together, so you can get the performance in one long flow. However, sometimes it is difficult for the actor to maintain consistency for a lengthy scene. O'Farrell says, “Sometimes we'll end up cutting multiple takes together, but from my standpoint that's a last resort for when you can't get it otherwise. Getting the flow in one continuous take is the ultimate goal.”

Once the rough cut is complete, I make a polish pass, editing out unwanted mouth noises, gulps, and other funk from the recordings. I do leave breaths in, for the most part; otherwise the performance would sound subtly unnatural. On some projects, the next step might be to lay the dialog over music or sound effects; on others, the voice is all that is required. Either way, your next step is to add any EQ or compression, as discussed earlier.

At this point, if you're recording dialog for media such as film, radio, or books on tape, you are almost finished. Bounce your finished tracks down to the agreed-upon format, record to DAT or burn to CD-R, and celebrate a job well done.


If you are recording dialog for interactive media, you have at least one more step. You will usually be asked to batch-process the files to prepare them for final delivery (see Fig. 5). That will include such tasks as sampling-rate conversion, normalization (the process of setting the peak level of all files to be identical, usually at or near maximum gain), and possibly data compression (such as MP3 conversion). In addition, files are often converted from one file format to another, such as from AIFF to Broadcast WAV. Tools such as Sound Forge, WaveLab, Peak, and BarbaBatch will help you accomplish batch conversion. I always deliver both the high-resolution original files and the low-resolution, batch-converted files to the client.

Armed with a little gear, a bit of chutzpah, and this article, you are ready to go record the next Orson Welles, James Earl Jones, or Meryl Streep. Best of luck, and never forget the magic words: “That was great, but unfortunately we had a technical problem. Can we take it again?”

Nick Peckis a composer, keyboardist, and sound designer-engineer. His day gig is Sound Supervisor at LucasArts Entertainment Company. Special thanks to Darragh O'Farrell for his ideas and insights into the dialog-recording process.