Making Tracks: Free Speech

GET CREATIVE WITH SOFTWARE SPEECH SYNTHSBONUS MATERIALWeb Clips: Click for audio clips that accompany this article.
Image placeholder title
Image placeholder title

FIG. 1: Online speech synthesizers like Cepstral offer a quick way to add spoken annotations to your tracks.

I've long been fond of speech synthesizers; somehow their robotic sound heightens the humanity of the rest of a song. When mixed in more subtly, synthetic mumbles and murmurs draw the ear by tickling the subconscious.

Speech synths are also handy for making practical sounds, such as alerts (“MIDI received!”), channel IDs (“Left … Right … Center”), and announcements (“1 kHz at -10 dB”). To make quickie IDs like those, I usually turn to the online AT&T Natural Voices speech synth (, which generates a downloadable WAV file in a variety of interesting voices when you type in the text. Feeding the foreign voices English phrases is especially entertaining (see Web Clip 1).

With a stream ripper like Ambrosia WireTap (Mac; or Applian Freecorder (Win;, you can capture the output of other online speech demos. My favorites are Cepstral ( and Loquendo ( Cepstral offers a range of comedic voices, including a raging drill sergeant, a demon, and a terrific whisper (see Fig. 1 and Web Clip 2). Loquendo includes vocal sound effects and responds expressively to exclamation points (see Web Clip 3).

Unfortunately, these online synths can be slow to render audio. More important, their output can't legally be used in commercial projects unless you pay hefty licensing fees. They were developed for the telephone systems of giant corporations, not for musicians. The irony is that like Yamaha Vocaloid (which is designed for musicians), the corporate synths can sound a little too realistic. The better they sound, the more they resemble an excessively Auto-Tuned human vocalist — the technology starts to suck out the personality.


For synthetic vocal effects with character, I like to go old school, tapping the wheezy, grinding robo-voices that lurk inside Mac OS X and Windows. Here's how to play them while you still can. Windows Vista contains a significantly smoother voice called Microsoft Anna, and Mac OS X Leopard features an even more realistic voice called Alex, which “breathes” between phrases. Like the AT&T, Cepstral, and Loquendo synths, Anna and Alex create sound by splicing huge databases of sampled syllables into new combinations. In contrast, the older voices discussed here are synthesized on the fly in gritty, low-res glory. (Some of them rely on samples, too — but brief, crunchy ones.)

To access these voices, I use two helper programs: Balabolka (Win) and Vox Machina (Mac). There are many other choices, as well as alternative low-res synths such as Melody Assistant ( and VocalWriter (, but Balabolka and Vox Machina are baby simple to use and are small, flexible, and free.


Start by downloading Balabolka (Russian for “chatterer”) from The download page offers additional free voices, but you can also grab them later from links inside the program's excellent help file.

Windows XP has the Microsoft Sam voice, which one developer describes as “the gravelly guy who sounds like he just drank a fifth of bourbon.” Sam's gone from Vista, so I downloaded him as well as Sylvia, an Italian voice (see Web Clips 4 and 5). To use the older “SAPI 4” voices, you may need to install Microsoft's speech driver, spchapi.exe; there's a download link to that in the Balabolka help file, too.

Type or paste some text into Balabolka's main window, and click on Play to try out the voices. You can also control playback with your PC's F5 and F6 keys. There are global sliders for pitch and rate, but you can alter individual words or syllables by wrapping them in XML tags like and (emphasis). The help file has a complete list (see “Step-by-Step Instructions,” 1 through 3).

I like entering bursts of nonsense words to create interesting rhythms; this is one vocalist who will never complain! When you're happy with the results, click on the WAV button to export the performance as a WAV or an MP3.


The speech-making process is similar on the Mac, although the syntax for modifying sounds is more squirrelly. Download Vox Machina from, enter some text, and watch the creepy animated lips flap.

Some of the Mac voices, like Organ, Bells, and Cellos, have built-in melodies. For others, you can assign pitches with the [[PBAS (pitch basis) tag. Instead of wrapping the word, as in Windows, the Mac tag precedes it. [[PBAS +2, for example, will raise the relative pitch one whole step (two semitones) (see Web Clip 6). PMOD (pitch modulation) is an especially dramatic parameter. For a complete list, see And if you've installed Apple Developer Tools, check out Repeat After Me, a program that analyzes the pitch and rhythm of sampled audio and generates tagged text for the speech-synth-like low-res physical modeling (see “Step-by-Step Instructions,” A through C).

Of course, most of the musical fun comes from what you do with the raw materials that these speech synths spit out. Chop the voices into syllables, and then load them into a sampler for pitched playback. Add delay, chorus, or reverb to smooth the jagged edges. Reverse the sounds, and mix them in softly to create spooky muttering.

Finally, a tip: my absolute favorite low-res voices come from an ancient Windows 95 program called Talk It. This purple gem still runs on Vista, and its FM-synthesized vocal stylings are cooler than ever (see Web Clip 7). For a free download link, go to

David Battino ( is the coauthor of The Art of Digital Music and the audio editor for the O'Reilly Digital Media site (



Step 1: Hey, meat sack! Microsoft dropped the Bender-esque Sam voice from Vista. Get it back with the link inside Balabolka's help file.

Image placeholder title


Step 2: Add expression by wrapping words in XML tags.

Image placeholder title


Step 3: To capture the vocal stylings of Web sites or nonrendering software, resample the PC's WaveOut Mix (

Image placeholder title


Step A: Enter some text in Vox Machina, and then select a voice. The four numbered voices (circled) are Cepstral voices I downloaded.

Image placeholder title


Step B: Add expression by inserting tags before the words you want to affect. Here, I inserted the pitches of “Happy Birthday.” Words turn red as they're spoken.

Image placeholder title


Step C: To capture the sound of Web sites or software that can't generate audio files, resample your Mac internally. Here, I'm using the WireTap stream ripper.

Image placeholder title