Chances are you occasionally send rough mixes to clients or vie for jobs using compressed audio samples. But it's surprising how little some engineers and producers know about audio-compression techniques, especially considering that their art and their livelihoods can depend on them. Each of the available formats has unique strengths and weaknesses, and selecting the one that suits your needs is the Internet equivalent of choosing the right microphone or reverb.
FIG. 1: Both ID3v1 and ID3v2 tags can coexist for the same file. The two types can even contain different information.
Smart encoding practices can result in better-sounding files that can more accurately present your abilities to potential clients. They might even win you a contract over a competitor who just coasts along using a program's default settings. What's more, compatibility issues can arise from blindly using default compression methods; many programs subtly encourage you to use their native formats. Being careful about your choices can ensure that potential customers are able to listen to your clips.
I Can See Clearly Now
Numerous lossy audio encoders have emerged in the past few years, all of which are trying to beat the ubiquitous MP3 by offering sonic and technical advantages. Though each offers its own feature set and underlying technologies, what they all have in common is the goal of perceptual transparency.
Perceptual transparency is based on the idea that there is a certain threshold beyond which higher sound quality becomes largely inaudible and, as a result, functionally useless. This threshold will vary depending on the listener: EM readers probably have a higher average threshold than most consumers. When properly executed, lossy compression is tailored to the intended listener and will compress just enough so that any artifacts created lie barely outside the limits of audibility. That's precisely why MP3 files became so widespread: most people who were not expert listeners found that the audible artifacts from even relatively bad MP3s, such as those created at low bit rates by inferior first-generation encoders like Fraunhofer and Blade, were not troublesome enough to derail their listening experience. As a result, consumer adoption of the MP3 format spread quickly.
Tag, You're It!
But perceptual transparency is not the only goal. Every modern compressed audio format includes provisions for storing metadata, or data about data. Metadata is information about the music — artist, song title, album name, and so on. It's possible to accomplish the same thing using file names (“The Beatles - 1968 - The White Album - Side 3 of 4 - Track 4 of 7 - Everybody's Got Something to Hide Except Me and My Monkey.mp3,” for example), but there are programs that can read metadata and apply it to file names and directory structures across a large audio collection with a single click. That makes metadata infinitely more powerful than file names, because it allows you to reorganize your collection on a whim.
The ID3 standard emerged in the mid-1990s as a way of organizing metadata in a tag, which is a block of text attached to the audio file. This tag is arranged to make its contents easily readable by any program or hardware device accessing the file. It's prudent to set up your tags carefully, because they are an opportunity for you to achieve name recognition with clients and to ensure that you get proper credit for your work. Tagging is as easy as choosing Edit Info (or the comparable option) from the contextual menu in your MP3 playback program and filling in the data fields that appear.
The first iteration of the MP3 tag, ID3v1, was limited to 128 bytes of text, which meant that fields were sometimes too short for the data that belonged in them. ID3v2 solved this problem, but because the v2 tag is placed at the beginning of the file (rather than at the end as in v1), the process of tag writing can be slower. These days, most people use some form of ID3v2 (there are several subvariants), but it can't hurt to include both ID3v1 and ID3v2, especially because you never know which tag type your listeners' playback systems might be reading (see Fig. 1). If you have included only ID3v2 tags and your listeners have their programs or hardware set to read ID3v1, you'll show up as “Unknown Artist.”
Because the Comment field in ID3v2 is infinitely extensible, it can be used to store fairly advanced information. iTunes stores volume-normalization information (Sound Check values) in a special frame within the Comment field, and several programs can embed pictures of an album cover there. The Comment field is one of the most powerful tools available, because it's searchable and easily accessible by all major programs. If you're sending the files over the Internet, for example, you can use comments to plug your studio (be sure to include contact information). If you plan to use the files internally, why not store a list of the session players used on each track, complete with phone numbers in case you need to call them again for a similar project? You can also use it to make notes about alternate arrangements or mixes or mastering jobs, or rough recall information, or even outstanding unpaid balances from clients. The possibilities are endless.
ID3v2 also includes a URL field in which you can specify a Web address; this is advisable because you never know when one of your tracks might fall into the hands of a major record-label executive or a potential client. The field is not readily accessible from iTunes, but there are a slew of programs that can access it from Windows, and Panic Audion (see the sidebar “Manufacturer Contacts” for a list of companies mentioned in this article) is a lightweight iTunes alternative for Mac OS X that provides easy access to the field. It's also wise to include the URL in the Comment field in case your eventual recipient is an iTunes user.
Although it is both useful and widespread, ID3 is a tag format that applies only to MP3 files. There are a few rogue programs that might occasionally try to force ID3 tags onto file formats that don't actually support them, such as WAV. This practice should be avoided, though, because it invariably violates the standard of the file type being mangled, leaving you with a nonstandard and possibly unplayable file. Instead, you should use the metadata format native to the file type you choose. There are as many tag varieties as there are audio formats, but with proper decoder implementation, the differences between them should be imperceptible to the end user.
Heir to the Throne
Advanced Audio Coding (AAC) is generally viewed as a technically superior successor to MP3. It is also championed by the Moving Picture Experts Group (MPEG), which infused the new format with everything it learned about human audio perception from its experiences with MP3 over the last decade. AAC is the audio layer of the MPEG-4 standard, which means that the AAC audio stream will often be placed inside an MP4 container file. This file usually has a .m4a file extension as prescribed by MP4 naming standards (though a .mp4 extension would also be valid).
FIG. 2: iTunes will use AAC encoding unless you specify otherwise in the Preferences menu.
Apple Computer has been the single greatest force in encouraging adoption of AAC, enabling support in iPods and iTunes and relying on it for the iTunes Music Store (see Fig. 2). However, audio files purchased from the iTunes Music Store are wrapped in FairPlay digital rights management (DRM) technology and use a .m4p extension. These files are very different from standard MP4 files because they are locked and copy protected. AAC will probably be around for quite a while, as it has the weight of both Apple and MPEG behind it. It provides an excellent ratio of file size to sound quality and thus is a great choice if you are not concerned about widespread compatibility.
Locks on Your Windows
Windows Media Audio (WMA) is a proprietary audio format developed by Microsoft, which claimed it would provide audio quality comparable to that of MP3 at half the file size. That claim has since been debunked — standard WMA files are no better than vanilla MP3s. Although the Pro version has proven to be a powerful, efficient codec, even providing support for 24-bit multichannel audio, it hasn't gained widespread popularity due to a complete lack of hardware support. A lesser concern has been Microsoft's attitude toward non-Windows operating systems: there's no official support for WMA in Linux (though it can be accomplished using the Xine playback engine), and the notoriously unstable Windows Media Player for Mac was recently discontinued. (Microsoft is now instructing users to install a third-party QuickTime input plug-in to get WMA playback on the Mac.)
FIG. 3: Enabling copy protection in Windows Media Player will lock out unintended users.
Nevertheless, WMA has managed to spread considerably due to its strong implementation of DRM. MP3 is an inherently insecure format, as it was developed before the online media explosion. WMA took shape as content providers became increasingly concerned about security, and as a result it includes proprietary licensing technology that has proven to be useful for many online music stores. In fact, this technology is at the heart of the PlaysForSure program, a Microsoft-led initiative designed to ensure compatibility among online stores and playback devices. Secure WMA is most appropriate for large distribution outlets that sell protected files to end users, for content providers, or for those working with sensitive material and high-profile clients who are concerned about leaks.
It is possible, however, to create your own protected WMA files using Windows Media Player (see Fig. 3). But because the WMA scheme stores authentication licenses separately from the audio files (in contrast to the iTunes model, which will authorize or deauthorize the entire collection at once), you may find yourself locked out of your own media if you haven't backed up your licenses and your hard drive crashes. (Always keep your source files as unprotected PCM audio files so you can reencode if necessary.) Moreover, almost any playback mechanism that supports WMA also supports MP3, so all in all, there's not much benefit to using WMA format.
What's in a Name?
Ogg Vorbis emerged several years ago as a free, open-source alternative to MP3 and WMA. The format consists of two distinct parts: Vorbis, the audio-compression codec, and Ogg, the container. This is a potent combination: Vorbis delivers great performance at any bit rate, and Ogg includes a powerful and flexible metadata system. In addition, Ogg Vorbis is the only encoding method to promise eventual support for bit rate peeling, a function that would allow scaling of high-bit-rate files down to lower bit rates without the quality degradation that usually results from reencoding a previously encoded file. (So far it has failed to deliver on this promise, however.)
Ogg Vorbis has been largely marginalized by the commercial support thrown behind AAC by Apple and iTunes. On the other hand, because it is open source, using it does not require the payment of royalty fees, such as those charged by the Fraunhofer Group for using its patented MP3 format. As a result, Ogg Vorbis is often used for embedded applications like video games, and it's the format of choice for Wikipedia for the same reason. Further, the metadata capabilities of Ogg are more advanced than those of most other audio formats, making it especially suitable for someone maintaining an extremely large library of compressed audio.
(Anything but) Lame
In spite of the strong showing from newer formats, MP3 should remain a viable option for some time. That's mostly because the LAME encoder, an open-source project, incorporates advanced options that have kept the format competitive. First and foremost is LAME's ability to vary bit rate over the course of the file. This allows the encoder to adapt to the complexity of the audio stream being encoded in order to use more bits when they're needed and to lower bit rate when the signal is relatively simple.
By default, LAME uses constant bit rate (CBR) encoding, which applies the same amount of compression to the entire file. But by using the “ — abr” command when accessing LAME (for example, “ — abr 192”), you can enable average bit rate (ABR) encoding, which will vary the compression rate a modest amount as needed. Though constant bit rate encoding can sound fairly decent, ABR encoding will sound better than CBR at any given bit rate without increasing the size of the resulting file.
A step above ABR is fully variable bit rate encoding (VBR), which allows the encoding quality to fluctuate wildly according to the input file. This will sound even better than CBR or ABR encoding, again without impacting file size. VBR is enabled using the “ — vbr” switch (for example, “ — vbr 192”). Though some older hardware MP3 players had trouble decoding VBR files, VBR has now been around long enough that it is preferable in all but the most unusual situations. (One of the technically superior features of the Ogg Vorbis format is that it uses VBR encoding by default.) If your hardware has trouble with VBR files, it probably makes more sense to replace the hardware than to use poor encoding methods.
Trust the Experts
Modern versions of LAME include several different parameter presets that optimize encoding based on audio quality rather than file size. Ultimately, that's a much more useful metric of performance, and LAME undergoes extensive psychoacoustic testing before each release in order to make it possible. Early pro-MP3 claims promised perceptual transparency at bit rates as absurdly low as 128 kbps; this was soon found to be poppycock. LAME's “ — alt-preset-standard” (APS) option promises the same but is much more likely to deliver. APS is widely regarded as the new standard for MP3 encoding and works by analyzing the source file to determine the least destructive ways it can save bits. It then trims the file size through a combination of filters, noise shaping, and joint stereo representation.
FIG. 4: You can set options in LAME and many other encoders by using special command-line settings, such as those shown here.
LAME offers other presets as well. APE, or “ — alt-preset-extreme,” applies the same principles as APS using a slightly higher threshold of perceptual transparency. (The developers insist that the differences are merely theoretical and probably aren't audible to everyone.) API, or “ — alt-preset-insane,” is functionally equivalent to the “ — cbr 320” command, as it encodes every audio frame at the highest quality level allowed by the MP3 specification.
More recent releases of LAME have moved from using text string — based preset names to a numeric system in which lower numbers mean higher audio quality; “ — v2” is the new APS and “ — v0” is the new APE. Currently, both naming conventions will work, and both map to the same presets (see Fig. 4). That may change in future revisions of LAME, however.
Meet Your Arsenal
Many new digital audio sequencers include the ability to export mixes in a compressed format, but because they don't always support the more advanced encoding options, you may have to look elsewhere. In addition to having the tagging capabilities discussed earlier, Panic's Audion is a very capable OS X front end for LAME. There is also an AppleScript for the OS X version of iTunes called iTunes-LAME, developed by Blacktree, that overrides the default encoder and uses LAME instead, with all the special command options intact. Note that you have to start the importing process from the AppleScript menu in order to use LAME; using the normal buttons in iTunes will enable the default encoder.
FIG. 5: Many Windows programs, including Winamp, can be used as front ends for a number of different encoders.
In Windows, LAME can be accessed through a number of different programs, including Nullsoft's Winamp, Albert Faber's CDex, Illustrate's dBpowerAmp, and Andre Wiethoff's Exact Audio Copy. Unlike iTunes, which ships with its own built-in encoders, these programs can perform encoding tasks by accessing outside programs, so they can be used to compress files in any of the other formats mentioned earlier (see Fig. 5). Perhaps the most robust of all is Peter Pawlowski's free foobar2000, an astonishingly deep Windows-only program that is the Swiss Army knife of compressed audio.
Choose Your Weapon
Lossy audio codecs can sound very different, and if you're planning to use compressed files internally, pick a format based on your specific needs, then tailor the compression settings around the results of your own listening tests. If you are compressing your files in order to share them with clients and potential customers, stick with MP3 for compatibility, but consider using the LAME encoder with one of the high-quality VBR presets to get a sonic edge on the competition.
No matter what encoding methods you use, though, keep your original high-resolution files handy. If you eventually need to switch to another format, you won't want to transcode from one lossy audio format to another, as this just compounds the fidelity loss. Instead, you'll need to recompress from the original WAV or AIFF file, or rerip the tracks from the source CD. (You can't use a CD burned from lossy files as a source, because your system can't magically restore the data that was lost when those files were compressed.)
Although the most recent wave of audio formats allows for lossless compression that does not compromise fidelity at all, compatibility concerns and the need for rapid Internet file transfers mean that lossy compressed audio files still have a place in every musician's studio. Don't be afraid to experiment with the many available compression tools to determine which works best for you. With a little technical savvy, you'll be amazed at how good a modern encoder can sound.
Vijith Assar works at the Music Resource Center in Charlottesville, Virginia, and writes for the local newspaper. Visit him online atwww.vijithassar.com.