Talk Is Cheap - EMusician

Talk Is Cheap

As any Star Trek fan will tell you, computers are supposed to listen when you speak to them. Issue a command ("Computer, run a level 1 diagnostic"), and
Author:
Publish date:

As any Star Trek fan will tell you, computers are supposed to listen when you speak to them. Issue a command ("Computer, run a level 1 diagnostic"), and the attentive circuitry responds appropriately. In one memorable scene from a Star Trek movie, the crew had traveled back in time, and Scotty, seeing a Macintosh for the first time, picked up the mouse and spoke into it, thinking that it was a microphone. Audiences laughed knowingly, but that telling scene unearthed a deeper question: will the computer mouse eventually be replaced by a microphone as an input device?

My guess is that it won't happen right away. After all, we're still typing words and commands on a keyboard that was modeled after a product of the industrial revolution. Nevertheless, voice-recognition technology has made great strides in recent years and is already in widespread use on both the Mac and PC platforms.

In fact, with a little ingenuity, you can operate your desktop studio by merely barking commands into a microphone. Imagine standing several feet away from your computer with both hands on your MIDI guitar controller and recording, playing, stopping, and rewinding your favorite sequencer simply by speaking into a headset mic. Perhaps you'd rather use your voice to call out note durations while step entering music in your notation program. Voice-recognition technology holds great potential for musicians, and it's relatively easy to get started using. I'll explore some general issues and then provide step-by-step instructions for using the Mac's PlainTalk software as an example of how speech-recognition technology can be applied in the studio.

Before we continue, however, let's distinguish between the two main categories of speech-recognition systems: dictation programs and voice-command programs. Dictation programs convert spoken words directly into text in nearly real time. They can be a great aid to people with disabilities, and they're also gaining popularity in the business community and in professions where it's hard to work and type at the same time. Several companies now offer sophisticated dictation programs for the Windows platform, including Dragon Systems (www.dragonsys.com), which sells a range of programs at different prices and for different purposes; and IBM (www.software.ibm.com/speech), which markets a line of products based on the company's own ViaVoice technology.

Macintosh users have fewer options when it comes to dictation software, but the Mac does have one big advantage over Windows PCs: its operating system software includes a sophisticated and surprisingly robust speech-recognition technology called PlainTalk. PlainTalk doesn't perform dictation, but it does offer voice-command ability. This means that it can recognize spoken words or phrases that then trigger actions or initiate commands. (Actually, PlainTalk itself doesn't initiate commands; it simply tells a client application what it heard, and the app decides what to do about it.) PlainTalk is just what you need to operate a desktop studio, and it's readily available to anyone with a Power Mac and Mac OS 7.5 or later.

Let's take a closer look at the Mac's approach to speech technology and how to harness it for musical purposes. Windows users should contact the companies mentioned earlier to see what resources are currently available for voice-command operations on that platform.

PLAIN TALK ABOUT PLAINTALKApple's PlainTalk software actually covers two areas of voice technology: speech recognition and speech synthesis. Speech synthesis is essentially the opposite of dictation; it converts text into spoken words (synthesized speech). Mac users can choose from several voices that are included with the operating system software and can have text read to them from SimpleText documents or dialog boxes. Speech synthesis, however, is of limited value in the studio. In fact, I recommend disabling PlainTalk's text-to-speech option so that you won't have annoying alert dialogs read to you when you're trying to concentrate during a recording session.

PlainTalk's speech-recognition component, on the other hand, can enhance the Mac's graphic interface by freeing your hands from the tyranny of the keyboard and the mouse. Moreover, PlainTalk boasts several features that add to its effectiveness and make it easy to use. For example, it is a speaker-independent system. Unlike some applications that must be "trained" to recognize your voice, PlainTalk works with any male or female adult speaker of North American English. Voice samples of more than 500 adults from different parts of the continent were used to develop the acoustic models, which the software uses to recognize speech patterns. (According to Apple, PlainTalk may not work reliably if you speak with a heavy accent, and it doesn't work well for children because their speech is composed of different spectral characteristics than adults'.)

PlainTalk is a "continuous-speech recognition" technology, which means that you don't have to add ... a ... pause ... between ... words when you speak, unlike earlier "isolated-word" systems. In fact, PlainTalk works best if you speak clearly and naturally in fluent phrases or sentences. PlainTalk is also tolerant of extraneous noises (such as coughs or door slams) and of different kinds of acoustic environments (a studio versus a conference room, for example). In addition, the program uses a flexible "finite-state" grammar, which defines the commands it will respond to. You provide PlainTalk with a set of phrases (or sentences or word groups) to listen for; other speech patterns are simply ignored.

Actually, you don't have direct access to PlainTalk itself; it's simply a set of programming tools that control speech recognition. To use it, you need an application that incorporates PlainTalk speech- recognition code. Fortunately, Apple includes a simple Finder utility called Speakable Items, which you can start using right away.

SPEAK EASYSpeakable Items works by scanning the contents of the Speakable Items folder (located inside the Apple Menu Items folder). When it hears the name of one of the items in the folder, it responds by double- clicking that item. You could, for example, create an alias of your e-mail program, rename it "get my mail," and drop it into the Speakable Items folder. Then whenever you say, "Get my mail," the Speakable Items utility launches your e-mail program. It's that simple.

I was recently working on an audio file in BIAS Peak, and when I finished my editing for the day, I saved the audio file as a Peak document. Next, I made an alias of the file, named it "open my demo song," and placed the alias in the Speakable Items folder. Now when I boot my computer in the morning, I simply say, "Open my demo song," and the computer automatically launches Peak and opens the audio file in the waveform display. What's more, I don't have to dig through multiple folders to locate the file or the application.

Of course, to use Speakable Items, it must be installed on your Macintosh. If you're not sure that it is, check the Apple menu and see if the Speakable Items folder is listed. Also, check the Extensions folder to verify that the Speakable Items, Speech Manager, and Speech Recognition extensions are present. Next, check the Control Panels folder and locate the Speech control panel (see Fig. 1). From the drop-down menu, choose Speakable Items, and then click the button that turns it on. You must also plug a microphone into the Mac (unless your monitor has one built in) and set the input device to External Mic in the Monitors & Sound control panel.

If Speakable Items is not already on your computer, you can install it directly from the Mac OS 8.5 CD- ROM. Open the English Speech Recognition folder, located inside the disc's Software Installers folder, and double-click the installer icon. If you don't have the Mac OS 8.5 disc, you can download the necessary software from Apple's Web site (www.apple.com/macos/speech). The version of PlainTalk that comes with Mac OS 8.5 (version 1.5.3) offers better performance than the earlier versions; it does not, however, support the iMac or the newest G3 computers. An update is currently available with Mac OS 8.6 (free to Mac OS 8.5 owners at Apple's Web site).

So you'll know that the computer is listening to you, the Speech Recognition extension includes a floating Feedback window (see Fig. 2). This window displays an animated character (there are several to choose from-my favorite is Phil) and a text field that shows the commands to which the computer has responded.

The Speech control panel lets you determine how and when the computer listens to commands. One option, which Apple calls the "push-to-talk" method, lets you assign a key that triggers the listening mode; the Escape key is the default setting. With this option, the computer listens for commands only when the assigned key is held down. This approach affords maximum control over the voice-recognition activity and minimizes "misfires" caused by incidental conversation. But it ties up one hand and forces you to stay close to the keyboard, so it's not always the best choice for a desktop studio.

The option that I prefer uses a spoken word (or phrase) instead of a key. With this method, the computer listens all the time but responds only to commands that follow the assigned word. "Computer" is the default word, and it works quite well for me. So when I open the audio file mentioned earlier, I actually say, "Computer, open my demo song." Having a special trigger word reduces misfires, although it's still not as effective as the push-to-talk method. You'll get the best results if you choose a word or phrase that isn't likely to come up during normal conversation. Also, avoid words that sound like other common words that you would frequently use.

The least reliable option does away with trigger words and keys altogether. The computer simply listens all the time and tries to respond whenever possible. If you're working alone, this method might work fine, but it's generally more prone to misinterpretation. And if there are other people (that is, other voices) in your studio, it can definitely lead to problems.

SAY WHAT?Once you have Speakable Items up and running, you can explore its potential as a control interface for desktop studio operations. As I described earlier, you can open documents and folders and launch applications with Speakable Items, but that's only scratching the surface of what this utility can do. That's because Speakable Items can also be used to launch AppleScript scripts.

AppleScript is Apple's macro-creation and automation language. It enables you to perform multiple operations by simply double-clicking a script icon or accessing a script from within an application. The Speakable Items folder includes several scripts that perform such simple tasks as closing all windows, setting the computer's volume to maximum, or viewing a window's files by date. There's even a script to make a new file into a "speakable" item. (Just select the document and say, "Computer, make this speakable," and you're done.)

Scripts are created and edited in the Script Editor, which resides in the AppleScript folder inside the Apple Extras folder. By combining AppleScript with Speakable Items, you can gain voice control over a wide range of functions in any program that supports AppleScript. You can tell if an application is "scriptable" by dropping the application icon onto the Script Editor icon. If the program is scriptable, a window will open showing the program's AppleScript dictionary (see Fig. 3).

With a scriptable program, you can create a macro by entering text directly into the Script Editor. You might, for example, create a script that tells your sequencer to start recording. You would then save the script and name it "start recording." If your trigger word is "computer," you would simply say, "Computer, start recording," and your sequencer would enter recording mode. There are several good books that give a full explanation of how to use AppleScript and the Script Editor. You can also get detailed information about using AppleScript by visiting Apple's Web site (www.apple.com/ applescript).

NOT MY TYPEMost current audio and music programs have no direct support for AppleScript. There is, however, a work- around for this problem: you can use a PlainTalk-enabled macro program to fake a keyboard command in response to a spoken word or phrase. Some macro-creation programs, such as QuicKeys, are AppleScript compatible, so you can use them with Speakable Items to trigger keyboard events that affect the frontmost application-your sequencer, for example. For an even easier (and cheaper) solution, download a copy of Michael Kamprath's Speech Typer (www.kamprath.net/claireware/speech_typer.html), a simple $15 shareware utility that converts spoken words into keystrokes.

Speech Typer consists of two parts: the Engine, a background-only application that does most of the work; and the Controller, which is used to edit phrases and set preferences (see Fig. 4). The Speech Typer Engine listens continuously. When it recognizes a phrase from its Listen Phrase list, it types the corresponding user-defined Response Phrase, which can be anything from a single keystroke to a full business memo. (Speech Typer uses the same trigger word that you have set up in the Mac's Speech control panel.) In many ways, Speech Typer is similar to Speakable Items, except that you're sending keystrokes to the computer rather than double-clicking items. And that's just what you need in order to operate your favorite sequencer with voice commands.

Most high-end sequencers (such as MOTU Digital Performer, Steinberg Cubase VST, Emagic Logic Audio, and Opcode Vision DSP) allow you to assign keyboard commands to trigger almost any function in the program (see Fig. 5). You can cut, copy, and paste; operate the transport controls; and select note values for step entry and quantizing. Unfortunately, Speech Typer doesn't use the Mac's Control, Option, or Command keys, so you'll probably have to reassign many of the key commands in your sequencer. Nevertheless, there are plenty of keys to go around for most tasks. For example, I set up Cubase VST with the following key equivalents: Record = R, Play = P, Stop = S, Rewind = W. Then I set up Speech Typer to type those letters when I speak the corresponding words. Now I can sit with both hands on my MIDI guitar controller, and when I say, "Computer, record," Cubase goes into Record mode and I can start playing. I can also perform other common tasks, like turning the metronome on and off, by expanding my list of commands.

STATE OF THE ARTSpeech technology does work, and it's lots of fun to play around with, but it's still far from perfect. Furthermore, it may be impractical in many studio settings. For example, if you record directly into an audio-editing program or audio/MIDI sequencer, you can't tie up your computer's Mic input for issuing voice commands. In fact, some music and audio programs are simply incompatible with PlainTalk.

You'll also have to get used to a bit of lag time when controlling a sequencer by voice. When you speak a command, Speech Typer must listen and recognize the speech pattern by comparing it to its list of commands. It then types the keystroke, which the sequencer must recognize and respond to. Each of these steps takes time; you can reasonably expect a one- or two-second delay after you speak a command before things start to happen. That pretty much rules out things like punch-in/out recording, where speed and precision are essential.

Moreover, Speech Typer doesn't always recognize my commands the first time around, so I often have to repeat myself. I find that enunciating clearly improves my success rate, although it's never 100 percent effective. Some words consistently give me trouble and have to be changed to more easily recognized phrases. For example, "rewind" seems to cause confusion, so I use the command "go back," which works much better. In addition, some microphones work better than others. I found that Apple's PlainTalk mic worked the best of several inexpensive dynamic mics that I tried.

In spite of their shortcomings, however, Speakable Items, Speech Typer, and other PlainTalk-enabled programs hold much promise for the future. Apple has made a serious commitment to speech technology (it has been shipping speech-related products since 1993), and you can expect to see more widespread use of voice-recognition tools in the coming years. For now, you can enjoy a glimpse of what lies ahead in computer interface design. And just remember: the next time you start to curse at your computer, it might be listening.

Associate Editor David Rubin talks to his computer only when he has something important to say. Special thanks to Tom Bonura of Apple Computer for his help in preparing this article.