Picture Perfect Sound

As film, video, and audio converge in computer-based workstations, individual users are forced to wear a steadily increasing number of hats. Traditional
Image placeholder title
Image placeholder title

As film, video, and audio converge in computer-based workstations, individual users are forced to wear a steadily increasing number of hats. Traditional barriers that once clearly separated jobs in sound, music, and picture are starting to crumble, and musicians have a growing need for knowledge in related fields.

Whether you produce sound effects, music, or dialog, as an audio specialist you need to understand how audio works in relation to film and video so you can properly prepare your materials. Furthermore, a well-rounded knowledge of technical issues and terminology will make you a better team player on audio-visual projects. Let's take a close look at how film and video are related, and how that relationship may affect synchronized audio tracks.

As most people know, film is developed and projected in much the same way as the slides you take with your personal camera, whereas video records images as magnetic signals on tape. Film's picture resolution is quite high, so it typically presents viewers with a richer visual experience than video. Vibrant colors, fine details such as hair, and subtle lighting are much more apparent and well defined on film. Because of this better picture quality and film's "analog" look, most productions that have the budget for it use film.

Video, on the other hand, is quicker, cheaper, and easier to work with. To save time and money, most film productions are initially edited on video. Film is shot, developed, and then transferred to video workstations, where editors and directors can make choices about which shots to use. Once they've decided on most of the changes, the workstations generate a "cut list," and the film is spliced together by hand, using the age-old tools, razor blades and tape.

Whenever you switch between video and film, you have an instant problem: frame rate. The frame rate is the number of individual pictures shown each second (measured in frames per second, or fps). Video standards around the world use various frame rates from 25 fps (PAL and SECAM) to 60 fps (HDTV). The television standard we currently use in the United States, called NTSC, was adopted by the FCC in 1953. (Of course, the standards are changing as digital TV emerges, but that's another article.)

Image placeholder title

FIG. 1: One video frame is made up of two interlaced video fields.

NTSC video flashes around 30 individual frames every second, whereas with film, you get only 24 frames per second. In addition, video subdivides each frame into two smaller segments, called odd and even fields, for a total of 60 images per second (see Fig. 1). Obviously, playing a 24 fps theatrical feature on a 30 fps VCR requires some adjustment.

Film is transferred to video in a telecine bay, a very expensive piece of equipment that few of us will ever have the pleasure of owning. "Telecine" (pronounced te-le-SIN-ay) was coined from the words "television" and "cinema." You must understand some of the telecine process in order to understand how audio works with film and video.

A direct frame-for-frame transfer between film at 24 fps and video at 30 fps would be problematic. The videotape would run 6 frames too fast every second, thereby shortening the overall picture by 21,600 frames (12 minutes) every hour. The audio tracks would go out of sync almost immediately on playback.

Image placeholder title

FIG. 2: The SMPTE-A 2-3 pull-down transfer process converts four film frames into five video frames.

A clever technique called 2-3 pull-down solves the problem. Instead of making each frame of film one frame of video, the telecine process takes four frames of film and makes them into five frames of video. The first frame of film is pulled down into a projector, and two video fields are electronically recorded onto tape. The second film frame is then pulled down and recorded as three video fields. The same process is repeated for the third and fourth film frames. Because frames of film are recorded in alternating groups of two or three video fields, we call this process "2-3 pull-down" (see Fig. 2).

The ten fields recorded from the four film frames make up five video frames. (Remember this four-to-five ratio; it's important.) In every second, this adds 6 frames to the original 24 frames, producing a 30 fps video copy that maintains the timing of the original film.

If you want to edit sound to film, understanding the pull-down process is very important because most sound and music editors edit their sounds to video. (They don't have the luxury of building movie theaters in their homes.) They lay their sounds to 30 fps videotape that has been "pulled down" from the 24 fps film for the big screen.

So what happens when you take the golden soundtrack you synched to video and attempt to play it with film? The sound should be in sync with the picture, right? Unfortunately, that's not what happens. In fact, the audio begins to drift, and the longer you play it, the more out of sync it gets.

Image placeholder title

FIG. 3: This formula shows how the four-to-five ratio is maintained even though the video speed is actually slightly less than 30 fps.

But wait a minute. Didn't I just finish explaining how 2-3 pull-down keeps the video copy in sync with the film? Well, I lied, but just a little. I stated that video runs at around 30 fps. It actually runs at 29.97 fps because of a workaround that was introduced when color broadcasting was invented. That means film frames don't break down so neatly into video frames. During the telecine transfer process, the playback of film is slowed down a bit to maintain our four-to-five frame ratio. The ratio is actually 23.976 film frames for every 29.97 frames of video-subtracting 0.1 percent (the difference between 30 fps and 29.97 fps) from 24 fps gives us 23.976 fps (see Fig. 3).

This means that when you look at film transferred to videotape flashing 29.97 fps on your VCR, you are actually seeing 23.976 of the original film frames per second. If your audio is still traveling at 24 fps, it is moving too fast and will run ahead of your picture. The audio must be slowed down to match the actual rate of your picture. This is where your audio editor's sample-rate pull-down command comes into play.

By activating the pull-down function, you slow the audio playback rate down by 0.1 percent to match the video. On your DAW, your sample-playback rate will change by 0.1 percent (44.056 kHz instead of 44.1 kHz). Film rate refers to sound that was recorded in sync with the film camera, while video rate is defined as the pulled-down rate of the sound.

It is important to realize that time codes and frame rates can be, but are not necessarily, the same thing. In general, music production uses 30 fps time code because there is no need to sync to picture. Video production and television broadcast, however, use a form of 29.97 fps time code because all timing is dependent on the video frame rate. Film productions typically employ the same time codes used by music or video productions.

Because of the difference in synchronization schemes used by film cameras and field audio equipment, it's common to have a picture frame rate of 24 fps and an audiotape recorded with 30 fps time code. Some productions may not synchronize their camera and sound equipment. In that case, the production sound mixer simply stripes the tape at (hopefully) 30 fps and lets the audio equipment run free. Time code alone will not tell you whether you're running at film rate or video rate. The actual picture frame-rate information must also be known.

When you have field audio (such as dialog) recorded at film rate (24 fps) and you want to edit that audio to pulled-down video (29.97 fps), you must set the pull-down to slow the audio playback (by 0.1 percent) in order to maintain sync with the original film frames (which have been slowed to 23.976 fps). The reverse is also true. When you have sound effects and music edited to telecined video (23.976 film fps), you must speed up the audio (by 0.1 percent) to maintain sync with the projected film (24 fps).

Most professional workstations have appropriate settings for dealing with any pull-down situation, but the operator must know at what rate (video or film) the source tapes were made.

Unfortunately, pull-down terminology is not standardized, and confusion may arise when you try to decipher various owners' manuals. Manufacturers generally lump the topics of picture-frame manipulation and audio-playback speeds under the general heading "Pull-down." You are expected to know the difference (even if they don't) when they discuss the issues.

I've run across various terms for the pull-down settings, including "pull-in and pull-out," "pull-down and pull-up," and "1.00 and 0.99." Time-coded DATs may force a pull-down when a tape striped with 30 fps time code is inserted into a machine set to 29.97 fps time-code settings. Most users' guides do not give a detailed explanation of the process, but some explain the relationship between the settings and the sample-playback rates. Armed with an understanding of pull-down and with your manual, you should be able to figure out proper settings.

When a project involves film or video, timing is derived from the visual elements, and the audio has to be adjusted to achieve synchronization. If you're doing audio work that involves both visual formats, you must know how to deal with the differences.

The ability to go directly from DAW to mixing is now a reality, and this method will grow in popularity as production budgets are squeezed. There never seems to be enough time, and that causes a lot of frayed nerves as a project nears the end of the post-production process. By remembering the differences between audio-visual formats and dealing with them correctly, you can avoid the frustration of unnecessary delays.

Gene Takahashi works for Pixar Animation Studios. He is also starting up MultiMedia Audio Productions, a provider of audio services for corporate multimedia.