To be clear at the outset, your lead singer is in no danger of being replaced by Cantor, VirSyn's new voice-synthesis software. Desktop speech synthesis has not yet reached that level, although it may in the future. To quote Cantor's designer, Harry Gohs, “Keep in mind that Cantor was not built to replace humans in this field; its main goal is to open a new playing ground for composers and sound designers to explore the exciting new area of ‘language sound systems.’” To that end, Cantor does not disappoint.
FIG. 1: VirSyn Cantor's Score editor is used for entering and editing notes and lyrics.
Cantor has a standalone application as well as VST and AU format plug-ins for Mac OS X and Windows XP. Support for RTAS and Propellerhead's ReWire is planned for a future release.
Speech synthesis is a complex process. You'll need a fast CPU for Cantor to “sing” multiple parts while other audio tracks and plug-ins are playing. For example, using Cantor standalone to sing four-part harmony on a 3.4 GHz Pentium 4 processor running Windows XP or a dual G5 2 GHz processor running Mac OS X gobbles up roughly 25 percent of the CPU.
IN A WORD
Speech synthesis — first mechanical, then electro-mechanical, and finally digital — has fascinated humans for several hundred years. (For a survey of the field, see “Voices from the Machine” in the February 2004 issue of EM.) Until recently, attempts to synthesize the human voice have mostly been the province of academic researchers using high-end computer systems. But high-speed microprocessors have brought speech synthesis within reach of the desktop musician. Cantor, along with Yamaha's Vocaloid technology, is the first of what is sure to be a continuing flow of software for synthesized singing. (For a review of Zero-G's Vocaloid 1.02 Leon and Lola, see the August 2004 issue of EM.)
Cantor can sing eight monophonic parts, each with its own notes, lyrics, and voice. The notes and lyrics are entered in a piano-roll-style Score editor, in which you can also enter automation for various voice parameters. Voice programming involves two separate editors: Voice and Phoneme. Voice is a synthesizer for simulating the vocal cords and breath, and Phoneme is a morphing formant filter for generating phonemes, the basic building blocks of speech. Cantor converts lyrics, entered as text, to phonemes using a 120,000 word Pronouncing Dictionary provided by Carnegie Mellon University.
Cantor's user interface is divided into five pages: one for each of the three editors just described (Score, Voice, and Phoneme); an effects page (FX); and a mixer page (Mix). Each of Cantor's monophonic parts has its own editor settings as well as settings for three insert effects (Distortion, Delay/Echo, and Chorus). The mixer mixes the eight parts and controls the reverb-send level for each part. On the Mix page, you can specify a MIDI channel and note range for each part. Those apply when Cantor is played live or is used as a plug-in.
STEP BY STEP
Cantor's Score editor is used to enter and edit notes and their associated lyrics (see Fig. 1). It contains tools for entering, selecting, moving, deleting, copying, and pasting notes in a familiar piano-roll display. Knobs along the left edge of the Score page control a variety of voice parameters including vibrato, brightness, balance (between voiced and unvoiced phonemes), gender bending, breathiness, and vibrato.
Score is arguably Cantor's most important page, but not its most elegant. All note manipulation must be done onscreen (there's no MIDI note entry), tools must be selected with the mouse (there are no key commands), and the display is not vertically resizable, which results in small text-entry fields and the potential for notes to get cluttered. Those issues aren't showstoppers, but some upgrading of the Score editor would be welcome.
Notes are entered by clicking on them with a pencil tool. Notes automatically get the one-syllable default lyric “La.” Once a note is entered, you can click on its lyric with the pencil tool and type in text. You can enter polysyllabic words, but single syllables work best. You can follow a syllable with a hyphen to tell Cantor that the syllable is connected to the syllable in the next note. That gives you independent control of pitch and note-specific automation for each syllable, while still getting the correct phonemes from the Pronouncing Dictionary. You can optionally display and edit the phonemes that Cantor generates from your text. For a more creative approach, you can type in strings of phonemes and forget the text.
The smaller window below the piano-roll display is for editing note Velocity and for creating automation envelopes for some of the voice parameters previously mentioned. Envelopes are associated with individual notes, which gives you incredibly detailed control — for example, you can apply a different pitch contour to each note. Envelope editing is easy and intuitive.
SPEAK TO ME
Once you have entered notes and lyrics, you can either let Cantor play the sequence or trigger individual notes in the sequence from your MIDI keyboard or from a host application when Cantor is running as a plug-in.
If Cantor plays the sequence, then the sequenced pitches, Velocities, and lyrics will be used. If you trigger the notes from an external source (keyboard or host software), then the external source controls the pitch and Velocity. But the lyrics, and more importantly the note order, are still controlled by Cantor. You can step through the notes only in the order that they appear in Cantor's score. Aside from setting loop boundaries to control where the sequence starts and stops, you have no control over which notes are played. That inhibits Cantor's use as a live instrument to a certain degree, but it is not a fatal flaw. A future update will have MIDI control of note selection.
When Cantor is used as a plug-in, you can select between Automatic and Manual modes. In Manual mode, MIDI from the host is handled just as it is from a MIDI keyboard. In Automatic mode, Cantor plays the score as it does when it's running standalone, but the host controls playback tempo and position. A hybrid mode, in which the host controls tempo, position, pitch, and Velocity, would be a welcome addition.
Cantor's voice is controlled from the Voice and Phoneme pages, and each of Cantor's eight parts can have a different voice. You can think of the Voice page as the synthesizer's sound generator and the Phoneme page as its filter. The sound generator simulates breath and vocal cords. The filter simulates the filtering effect of the mouth, tongue, and nasal tract.
The sound generator uses a combination of additive synthesis and noise sculpting (see Fig. 2). It is used only for voiced sounds; the phoneme set has complete control of unvoiced sounds. (The division of speech between voiced and unvoiced sounds is more complicated than differentiating vowels from consonants, but that communicates the basic idea.) For voiced sounds, the additive synth controls the pitched component of the sound (vocal cords), whereas the noise synth controls the breath component (whisper). The Breath knob together with any Breath automation controls the mix of the two.
FIG. 2: Cantor's Voice editor uses additive synthesis (top) to simulate the vocal cords and filtered noise (bottom) to simulate breath.
The Voice page will be familiar to you if you have used any of VirSyn's synths. The top half of the display is for editing the levels of the partials in the additive waveform. Each partial represents a sine wave at a whole-number multiple of the fundamental frequency — the leftmost being the fundamental, the next being an octave higher (2x), the next being a fifth above the octave (3x), and so on. You can have as many as 256 partials, but as you go higher, they are controlled in groups. The first 32 (the most important) can be set individually, and as you move the mouse over the window, the targeted partial turns red. Although all partials are always displayed, a scrolling numerical allows you to control how many are produced — fewer partials means less computation.
The window below the Partials window is for entering the noise-transfer function, which is a fancy name for a spectrum filter applied to white noise. You draw in the shape of the filter, then use the Breath control on the Score page to determine the mix of filtered noise with the output of the additive synth.
PHONEMES AND FORMANTS
Try to say something without moving your tongue or jaw, and you will immediately grasp the importance of Cantor's Phoneme page. You move your mouth to sculpt the sound of your vocal cords and breath into words. The mouth is an extremely complex filter.
FIG. 3: Cantor's Phoneme editor consists of two formant filters. Phonemes are produced by morphing from the top to the bottom filter.
Although complex, it takes only 39 filter pairs to sculpt all of the sounds needed to construct all of the words in the English language. (That number varies only slightly depending on the language and the speech-theory being used.) Two filters are used to reproduce the change that occurs for some phonemes — for example, the “ow” in “cow.”
Setting up phoneme filters for intelligible speech is not a job for the faint-hearted. Fortunately, you do not have to take on that task. Cantor comes with six factory sets of English phonemes, and sets for other languages are planned for a future release.
Fig. 3 shows Cantor's Phoneme List. If you choose one of the 16 user sets, which are filled with Factory Set 1 by default, you can edit the filters for each phoneme. A context menu allows you to copy and paste filters between phonemes, and you can produce some interesting language mutations by modifying the user phoneme sets. You can also enter phonemes directly in the score to produce vocal sequences that are not words. (If you plan to do that, printing out a screen shot of the phoneme list is a good idea.)
THE LAST WORD
Cantor's three insert effects can have different settings (including off) for each part. Distortion has soft, tube, and tape models; Delay/Echo can be mono, stereo, or ping-pong; and Chorus has phasing and flanging. Reverb is global with an individual send for each part.
Cantor saves two kinds of files: projects and presets. Projects contain all information on all pages for all parts — in other words, a complete Cantor setup. Presets contain all voice settings for a single part, meaning all settings from the Voice, Phoneme, and FX pages, except for the reverb settings. That makes it possible to exchange voices between parts without affecting the notes or lyrics. Cantor ships with 29 voice presets covering everything from male and female singing voices to extreme effects (see Web Clip 1).
Some Cantor preset voices are considerably more intelligible than others. You can get understandable lyrics out of Cantor, but as mentioned, that is probably not it's best use. It truly shines at producing a wide range of intriguing vocal sounds, which, used judiciously, can enliven a mix. You can download a save-disabled demo of Cantor as well as a variety of sound clips from VirSyn's Web site.
Minimum System Requirements
MAC: G4/400 MHz; 256 MB RAM; Mac OS 10.2
PC: Pentium III/600 MHz; 256 MB RAM; Windows XP
Cantor 1.02 (Mac/Win)
FEATURES4.0EASE OF USE2.5QUALITY OF SOUNDS3.5VALUE3.0RATING PRODUCTS FROM 1 TO 5
PROS: Powerful and flexible additive-synthesis and formant-filter-morphing engine. Massive text-to-phoneme dictionary. Built-in song sequencer.
CONS: Note and lyric entry scheme is cumbersome. Limited ability to select lyrics with MIDI. No MIDI note entry.