We live in a world that's increasingly defined by the term multimedia. It's rare to find a studio these days that isn't equipped with a video monitor, and even video-editing capabilities are becoming commonplace. The only concerts that don't involve video displays are those performed by symphony orchestras, and many of us find that new creative and commercial opportunities require us to marry sounds to images (for more on this subject, see this month's cover story, “Picture Window,” on p. 40).
It's useful, then, to have a grasp on the essential characteristics of digital video so that we can better embrace its possibilities. In this article, I'll discuss video basics from formats to standards to codecs.
FIG. 1: This table shows common video frame rates.
As you know, there's no such thing as a moving picture — the illusion of movement is created by a rapid-fire slide show of still images. Each successive image shows a scene captured at (or created to represent) a subsequent moment in time, and our brains happily draw the connecting thread between them. If the slide show is too slow, however, our brains distinguish the slides as individual images. The speed of the slide show is called the frame rate, and a frame rate of about 10 frames per second (some experts believe it is closer to 16) is the point at which we make the leap from still images to a perception of motion.
There are several common frame rates in professional video use. In the world of film, 16- and 35-millimeter cameras operate at 24 fps (see Fig. 1), and digital video often uses the same frame rate to achieve a film look. In most parts of the world, however, analog video follows either PAL or SECAM standards, which operate at 25 fps. The NTSC video standard, used primarily in North America and Japan, operates at 29.97 fps dropframe. For complex reasons dating to the transition from black-and-white to color, NTSC counts 30 frames per second, runs a bit too slowly to display them all, and therefore skips a couple of them each minute. Never fear, however — only frame numbers are dropped, not the video information itself. For NTSC compatibility, digital video can use 23.976 fps (sometimes 23.98 for short). Depending on the application, it can be drop-frame or not.
If the notion of frame rate sounds suspiciously similar to that of sampling rate, there's good reason. A frame of film or video is analogous to a single sample word of PCM audio. Just as a minimum sampling rate is required for acceptable audio quality, a minimum frame rate is required for acceptable video quality. However, a frame of film or video can stand on its own as a still image, whereas an audio sample is useful only in context. Describing an image is therefore proportionally more complex than describing an audio sample.
Image Is Everything
FIG. 2: This figure shows a variety of film and video aspect ratios.
On a computer screen, the resolution of an image is defined by its width and height in pixels (from picture element); for example, an image might be 800 pixels wide and 600 pixels high. By contrast, the resolution of analog video is defined in number of scan lines, which for NTSC is 480. Digital video retains the notion of horizontal scan lines while defining vertical resolution in terms of pixels. Typical resolutions range from 704 × 480 for standard-definition digital television (SDTV) to 4,096 × 2,160 for Digital Cinema 4K, the current state of the art for theatrical presentation. The highest resolution allowed for high-definition (HD) digital television, also known as HDTV, is 1,920 × 1,080.
Each frame of video is projected in two passes, or fields, doubling the effective frequency of the light's modulation to prevent flicker. In analog television, the odd-numbered fields consist of only the odd-numbered lines of the frame; the even-numbered lines are scanned in the even-numbered fields. This is known as interlacing or interlaced scan. In digital video, it is common to draw each frame in its entirety twice — once in each field — a technique called progressive scan. Progressive scan usually results in a clearer, sharper image than other techniques do. Video resolutions are often described in shorthand that drops the number for width and uses i for interlaced and p for progressive. For example, the two most common HDTV broadcast resolutions are 720p (meaning 720 progressively scanned lines of 1,280-pixel resolution), used by ABC and ESPN, and 1,080i (meaning 1,080 interlaced lines of 1,920-pixel resolution), used by CBS and NBC.
It's important to distinguish between resolution and aspect ratio. On a computer monitor, pixels are essentially square, so the ratio of an 800 × 600 image's width to height is 4:3 (or 1.33:1) — the same shape as analog television (see Fig. 2). SDTV (704 × 480) uses rectangular pixels to achieve a 4:3 aspect ratio even though its pixel width-to-height ratio is 4.4:3. HDTV's aspect ratio is 16:9 (or 1.78:1), while theatrical releases commonly use an aspect ratio of 1.85:1 or 2.39:1.
If each pixel were either black or white, one bit per pixel would be sufficient to describe its state in a given frame. Representing shades of gray or color, however, requires multiple values per pixel. Computer displays address this directly, defining the color of each pixel by a binary number: 8-bit color provides 256 (28) colors, 16bit color provides 65,536 (216) colors, and so forth. True color display uses 24-bit words, allocating 8 bits each to the red, green, and blue (RGB) color channels. Variations include Digital Cinema's 12 bits per channel and a 32-bit variation of true color that allocates 8 bits to the alpha channel, a measure of the pixel's transparency. The alpha channel is used in computer graphics manipulation but plays no direct role in video playback.
By the Numbers
Electronic musicians are well accustomed to multiplying bit depth by sampling rate by number of channels to calculate digital audio bandwidth and storage requirements, but video's extra dimension (color depth) raises the stakes significantly. Consider the relatively modest example of SDTV: 704 × 480 equals 337,920 pixels per frame. Multiply that by 30 fps, and you get 10,137,600 pixels per second. If the color depth is 24 bits per pixel, this comes out to 243,302,400 bits per second, or about 1.7 gigabytes per minute.
At that rate, even an HD-DVD or Blu-Ray disc would hold only between 9 and 15 minutes or so, respectively, so data compression is required to make digital video practical, especially at high resolutions and color depths. Like MP3, AC-3, and other audio codecs, video codecs take advantage of a variety of practical and perceptual coding techniques to bring the bit rate down to a manageable size.
FIG. 3: This is a partial list of video formats that are supported by QuickTime.
The most common digital video format standard, Apple's QuickTime (QT), is actually a container file rather than a codec that can be extended to support nearly any codec (see Fig. 3). As you watch a movie in your QuickTime Player, you usually don't need to worry about whether the video uses the Sorenson 3 codec or MPEG-1 — QT makes it all work the same way. You can even download additional compatible codecs for use within QuickTime.
A codec that is increasingly popular for its exceptional ratio of quality to bit rate is ITU-T H.264, also known as MPEG-4 Part 10. H.264 excels at both low- and high-bit-rate applications, and it is a mandatory video codec for both HD-DVD and Blu-Ray players. By reducing resolution and accepting lower-quality encoding, one can achieve a viable bit rate for almost any production or delivery scenario.
Just as audio-production standards adapt and grow, so do video-production and delivery standards. The race to higher-definition video formats and their associated delivery media ensures a lively learning environment for those of us who embrace the multimedia future.
Musician, educator, and author Brian Smithers lives in Orlando, Florida, with his wife Barb and their three cats. His latest book is Mixing in Pro Tools: Skill Pack (Thomson Learning, 2006).