Say It With Pictures

Image placeholder title

Everybody knows that a picture is worth a thousand words. But did you know that a picture can also buy you the frequencies of a thousand oscillators? Using any of several modern music applications, you can convert the data that forms a picture on the computer screen into information that will generate an audio file. In most cases, the program uses the vertical position of each pixel of the image to control frequency and one or more color (RGB) values for parameters such as amplitude and stereo position.

In this article, I'll look at a number of programs for both Mac and Windows that allow you to perform this alchemy. Included are Adobe Audition 3, Thomas Baudel's HighC 2.2, Camel Audio's Cameleon 5000 1.5, Rasmus Ekman's CoagulaLight 1.66, Nicolas Fournel's AudioPaint 2.1, Image Line FL Studio 8, U&I Software MetaSynth 4, and VirSyn Poseidon 1.4. I'll also cover Mark Coniglio's Isadora 1.2.9, which can convert images to sound but is better suited to work in the opposite direction (see the sidebar “Izzy Gets Down”). I'll start with a general overview of the field before looking at each program individually.

Note that AudioPaint, Coagula, and HighC are standalone applications whose only role in life is the conversion of images to sound and, perhaps, vice versa. For Audition, Cameleon, FL Studio, MetaSynth, and Poseidon, image-to-sound conversion is just one of many features. Also, many of the ideas that drive image-to-sound software stem from research by Iannis Xenakis, a Greek composer whose UPIC system was among the first to allow musicians to draw the data used to generate sounds (see the online bonus material “The UPIC System” at Xenakis used imagery in creative ways when composing both his acoustic and electronic works.


Image placeholder title

FIG. 1: This image shows the original bitmap (left) that Audition used to produce the spectrum on the right.

All of the programs in this roundup share the ability to convert an image into an audio file, but the range of editing and processing features you'll find varies widely. For example, all except HighC allow you to import a preexisting bitmap image (typically a BMP or PICT file, but in some cases other formats as well), and all but Cameleon, FL Studio, and Poseidon let you modify or process an image before using it to generate a sound file. Audition, FL Studio, MetaSynth, and Poseidon convert the graphics file into a 2-D sonogram display, which shows frequency on the y-axis and time on the x-axis and uses intensity (brightness) to represent amplitude (see Fig. 1). HighC uses a hybrid musical-score/piano-roll metaphor and, like the original UPIC system, provides tools for drawing the gestures and shapes that will control musical parameters. Coagula and MetaSynth also provide tools for drawing an image from scratch, while AudioPaint lets you generate a new image automatically with its configurable Lines & Curves and Clouds of Points tools.

Deciding how to extract data from the bitmap and how to use the extracted data is a big part of these programs' toolkits. Poseidon takes each pixel on the y-axis of the image and assigns it a frequency value within the range of 20 Hz to 22.05 kHz, then uses additive synthesis to generate a new sound from the sum of those values. The amplitude of each partial changes over time depending on the brightness of the pixel, and each pixel accounts for about 3 ms of the new sound's duration. Most of the other programs work in a similar manner, though several — for example, AudioPaint and Audition — let you choose how the program will interpolate from one amplitude value to the next. FL Studio lets you specify an arbitrary number of partials (up to 999) for the resynthesis regardless of the size of the original image, and AudioPaint, Audition, Coagula, FL Studio, and MetaSynth let you set an arbitrary frequency range over which new partials will be generated.

Some of the programs — for instance, Audio-Paint, Audition, and FL Studio — let you determine whether the new partials are distributed in a linear or log fashion (linear distributes the frequencies in increments of hertz, while log uses increments of cents). MetaSynth goes quite a bit further by including a large number of tuning options. You could, for example, space the partials of the new sound in steps of different types of traditional scales (whole tone, major/minor, and so on), using various microtonal increments (from 4 to 1,024 divisions to the octave, with more than 1,000 scales included), or, like AudioPaint and Poseidon, using scales in Scala format. (Currently over 4,000 different scales are available at the Scala Web site, HighC lets you create your own pitch/frequency scales, but you have to create a list of tuning increments by typing it. (I'll discuss the other programs, including the additional extraction parameters they offer, in their respective sections.)

Image placeholder title

FIG. 2: Cameleon displays the converted image using adjustable sliders that represent the amplitude for each partial (top). A separate set of sliders is available for adjusting the noise components (bottom).

In addition to using additive to generate the new sound, FL Studio offers a nonadjustable form of granular synthesis. HighC and MetaSynth can use additive along with several other synthesis methods, including granular and FM. AudioPaint and MetaSynth will create a new sound by using the extracted data to control the playback parameters of samples (pitch-shifting and time-stretching, for instance).

Cameleon can display the converted image as an editable two-dimensional (frequency and amplitude) spectral plot and offers handy tools to manipulate the newly generated harmonic spectrum prior to synthesizing the sound. It also has the ability to use information extracted from an image to generate discrete bands of noise and allows you to alter the amplitudes of both the harmonic partials and noise components individually (see Fig. 2). Audition also lets you add a bit of random offset to the individual frequency components as part of the conversion process.

It's Not How Long You Make It

You can set the duration of the new file to any arbitrary length (with limits in some cases) in AudioPaint, Coagula, FL Studio, HighC, and MetaSynth, then once the program works its magic, all except AudioPaint, Coagula, and HighC allow you to manipulate the new audio file in various ways. As a full-featured sound-design “workstation,” MetaSynth offers an especially large number of effects and processes for that purpose, while Cameleon and Poseidon provide the professional processing and performance tools you'd expect from high-quality modern soft synths. Audition also has a wide range of options for working with your new audio file and is the only program in this group to provide a traditional multitrack-audio interface.

All of the programs allow you to export your new audio file, with 16-bit WAV being the most commonly supported format. And of the programs that can show a sonogram display of the converted image file, all but Poseidon let you export the display as a graphic image, perhaps for external processing and reimporting (FL Studio lets you copy the image and paste it into an external image editor).

Documentation varies widely among these programs, with Audition (which supplies the only printed documentation) and MetaSynth offering the best in the class. HighC provides some useful getting-started tutorials; AudioPaint, a short getting-started PDF; and Coagula, a thorough online help system (you can open the HLP file directly on the desktop). You'll also find users forums for a number of the programs, some of which are more active than others.

Keep in mind that in most cases, you won't automatically get musically useful results from any random image you choose to convert, regardless of which program you're using. I found that a bit of parameter tweaking, a fair amount of “postprocessing” (reverb, pitch-shifting, and the like), and, above all, a lot of trial and error were often needed for me to get something I could use. Also, the programs that allow you to draw gestures that will control musical parameters tend to be far more useful and, ultimately, satisfying. That's probably no surprise given the long tradition of using graphic symbols to specify musical parameters in Western music.

Adobe Audition 3 (Win, $349)

Audition is a full-featured stereo and multitrack audio editor, and given its family tree, which includes video and graphics apps Photoshop, Premiere Pro, and After Effects, it's no surprise that it boasts some innovative approaches to working with images. Audition's Spectral Frequency Display is where you work with spectra you generate both by analyzing an audio file and by importing graphic images, and the features it offers in both cases are identical.

By default, the program maps an imported image using increments of either 100 Hz per pixel (in linear mode) or 100 cents per pixel (in log mode) and lets you specify a different increment only if you already have an analysis open and a region selected. It has three additional options that determine how individual lines of pixels generate new frequency components: Pure Tones, which uses a 1-pixel-per-partial mapping and tends to produce rather static, harsh additive sounds; Random Noise Bands, which adds a bit of randomness to the partials' frequencies; and Track Frequency Spectrum, which is not clearly explained but produced the most interesting results in my tests regardless of the source files (see Web Clip 1).

One of the interesting options Audition provides is the ability to mask (filter) the spectrum of one file with that of another. For example, you could save the display of one sound's spectrum as a BMP file, then analyze a second sound and view its spectrum in the Spectral Frequency Display. You could then import the graphic image of the first sound's spectrum and use it to filter out some of the components of the second sound. In most cases, a good bit of trial and error is required, but the technique has a lot of potential for things like cross-synthesis and cloning the resonant qualities of one sound onto another.

Once you've got a graphic image converted into a spectral display, you can use Audition's powerful spectral-editing tools to modify the spectrum before resynthesizing it. For instance, you can isolate a small segment of the spectrum — all frequencies from 500 to 1,500 Hz between 3 and 6 seconds, for example — and then process just that region with reverb, EQ, or any other effect. And of course, with its robust multitrack mixing and editing options, Audition will make your newly generated audio file feel right at home when combined with any other audio files in your project.

Thomas Baudel's HighC 2.2 (Mac/Win, about $46 [MSRP])

Image placeholder title

FIG. 3: This is the score for Rob Arnold''s piece Study After Xenakis as shown in HighC. You can hear the resulting music in Web Clip 2.

HighC has strong ties to the original Xenakis UPIC system and in many ways is an enhancement to that system. The program's interface looks more like a traditional piano roll than the blank canvas found in most of the other programs, and many of the tools are optimized for drawing curves, lines, and gestures (see Fig. 3 and Web Clip 2). A typical session would start by using the Paint tool to draw some strokes on the canvas, picking a waveform and envelope to use for the sound file the image will generate, and then rendering the sound into audio. You can quickly build up echo, chorusing, or dense cluster effects by copying and pasting strokes, and easily adjust the total duration of the new sound regardless of the size of the original image you drew.

You can create new waveforms for use in rendering, but depending on what type of waveforms you want, the process can be very simple or somewhat oblique. For instance, to build a static waveform containing only harmonic partials, you use an intuitive display in which the amplitude of each partial can be adjusted with a slider. Similarly, to create a noise-based waveform, you can adjust sliders representing the spread and density of the noise. For FM, however, you need to use the main graphic display and create an association between the source sound you want to modulate and a second sound that will serve as a modulator. I haven't seen this type of implementation in any prior software, but to be fair, it is modeled on the original UPIC approach (in fact, it offers much better visual feedback than UPIC), and with a bit of practice it becomes second nature. All new waveforms and any default waveforms that you modify are saved automatically when you save the piece you are working on, and you can access the waveforms you create in one piece when working in another.

HighC's drawing tools are fairly basic and though it currently uses only synthesis to create sounds, according to the developer a version that can also incorporate samples is in the works. Still, it is unique in its approach to working with image conversion and has a strong heritage from which many interesting compositions have been created. At the developer's Web site (see the online bonus material “Manufacturer Contacts”), you'll find a free trial version that doesn't allow you to export your audio file; links to lots of examples; and some handy tutorials to get you started.

Camel Audio's Cameleon 5000 1.5 (Mac/Win, $199 [MSRP])

Cameleon's image-to-sound conversion features are a relatively minor yet still very useful portion of the program. Like Audition, Cameleon treats audio files and images that you import in much the same way, but Cameleon, like Poseidon, provides an editable set of parameters for both types of files. You can use an image file to produce up to 64 harmonic partials and/or unlimited bands of noise, and you can adjust either the instantaneous amplitude of an individual partial or determine how it evolves over time using an intuitive interface for those purposes. You can also modify the frequencies of all or only some of the partials' frequencies using the Detune function, which could be useful, for example, for creating sounds with inharmonic spectra. Unlike some of the other programs, Cameleon doesn't handle stereo position (only the height and brightness of pixels are used), so color images will be interpreted as black and white.

Once you've imported a graphics file, you can make edits to selected groups of partials (only odd or only even partials, for instance), or use the Formant Filter (with its various presets) to sculpt out a portion of the sound. Cameleon also provides a number of preset spectra for both harmonic partials and noise components, so you could easily combine the spectrum generated by a graphic image with the noise components of a vocal preset and adjust the relative amounts of each.

You can use Cameleon's Morph Square to morph among four different sets of sound parameters that you've generated using up to four different graphics files, or mix and match sounds that you've created in different ways. It's easy to build your own pathways for automated morphing or use any of the Morph Timeline presets that come with the program. You can also adjust the Morph position in real time and capture that output to disc.

Cameleon will export a discrete time-domain plot (frequency versus amplitude) and convert it into a sonogram display. You can then manipulate this image with a graphics program or simply examine it to better understand the spectral components of your sounds. This technique, and others related to image-to-sound conversion, is nicely covered in a thorough text-based tutorial.

I wasn't very familiar with Cameleon before writing this roundup, but I found it to be one of the best-sounding soft synths I've come across in a long time. If you're looking for a professional soft synth that includes respectable image-to-sound features, Cameleon is a good place to start.

Rasmus Ekman's CoagulaLight 1.66 (Win, free)

Image placeholder title

FIG. 4: Coagula has a robust set of brushes for drawing or modifying images.

CoagulaLight has been in “late beta” for some time but remains under active development. Though it won't replace Photoshop on your desktop, the program offers a wide range of brushes for creating new images as well as a rich set of tools for processing new or imported graphics files (see Fig. 4). Open an existing bitmap image, and you'll immediately have access to the Move, Zoom-rotate, and Skew-flip features. Couple those tools with the ability to create mirror images in a single step, and you can quickly create complex mosaics and other highly transformed variations on your pictures before rendering them into sound.

Coagula lets you map the extracted image data to any arbitrary frequency range (between 0.001 Hz and any frequency) and adjust the new sound file's duration from a fraction of a second to several hours. You can enable the Soft Envelope Sweep feature to produce very smooth transitions between pixel values (with the trade-off of slightly longer render times), and you can choose to render only a selected portion of the image or stop the render midway (the portion that is rendered will be playable). A slider lets you dial in the amount of noise you want added to each sine wave individually before the final render.

If you're a fan of granular synthesis, you'll find the Spray brush very handy (note that Ekman is also the developer of Granulab, a very powerful, real-time granular-synthesis application). You can control the size and shape of the brush by clicking-and-dragging in the small window in the Brush dialog box — the brush will update as you move the mouse — and you can create color gradients that will translate into panning parameters of your new sound.

Like MetaSynth, Coagula provides several folders full of filters with which you can alter new or existing files, and you can adjust how much a filter impacts the R, G, and B values of an image independently. You can also add your own filters — any BMP file will do. I posted a collection of files for this purpose at, some of which produce rhythmic patterns when applied to any image (see Web Clip 3). Try running several filters in series to produce polyrhythmic effects, or apply a filter using 100 percent R and 0 percent G, then flip and mirror the filter and apply it again using 100 percent G and 0 percent R to get different types of processing on the left and right channels (see Web Clips 4 and 5 for an example).

Nicolas Fournel's AudioPaint 2.1 (Win, free)

AudioPaint is an intuitive program that packs a lot of power under a very simple interface. It is one of only two programs here that can generate a new sound using image data to control samples, and it has a batch-processing option that allows you to generate dozens of individual audio files from a folder full of graphic images. Though it doesn't offer any painting tools or filters for editing an image you import or create in the program, the three image generators it provides are very flexible and tend to produce pictures that result in very interesting audio without a lot of effort by the user.

Image placeholder title

FIG. 5: This image was created using AudioPaint''s Clouds of Points tool. Listen to Web Clip 6 to hear the sound this image generated.

Clouds of Points, for instance, is used to create a grainy image with dozens or even hundreds of small colored points, each of which will trigger a sonic event (see Fig. 5 and Web Clip 6). You can determine the overall height and width of the image, the number of Clouds it will contain, and the number of points that will make up each Cloud. You can also pick the shape for the individual Clouds or simply choose the Random option, which lets the program pick all the parameter values randomly. Equally innovative and potentially useful is the Random Web Picture feature, which grabs nine random images from the Internet, displays them as thumbnails, then lets you pick the one that looks the most interesting.

As with the other programs, you can import a preexisting graphics file, but Audio-Paint supports more formats than most of the rest of this group (JPEG, PNG, GIF, and BMP). Once an image is loaded, you can use the Audio Settings dialog box to pick the minimum and maximum frequencies for your new file (using hertz or note/octave increments) and choose between sine waves and a sample for the new sound's source. (Be sure to lower the maximum sampling rate if you're using samples, as the program does not filter out frequencies above the Nyquist frequency. Or perhaps you'd prefer to experiment with aliasing.) You can also determine which parameter of the image — whether Red, Green, Hue, Saturation, or Brightness — is used to generate the audio on the left and right channels independently.

AudioPaint may be a one-trick pony, but I was able to achieve very good results with fairly minimal effort, especially using the built-in generators. If you make it to Fournel's Web site, check out his other innovative freeware (all Windows only) to see if something else suits your fancy.

Image Line FL Studio 8 Express Edition (Win, $49)

Image placeholder title

FIG. 6: FL Studio''s BeepMap will convert an image to any number of partials you specify. All of its controls operate in real time.

FL Studio's BeepMap is a synthesis plug-in that first appeared in version 3.0 and that is available in all levels of the program, including the entry-level Express Edition. It uses a graphic image (BMP, JPEG, or PNG) to generate a set of frequencies and amplitudes that control partials using additive resynthesis. In the conversion process, BeepMap assigns values extracted from red pixels to the left channel's amplitude and from green pixels to the right channel's. Yellow results in an equal value for both channels, and you can choose whether the program will ignore blue pixels or use their values to scale each pixel's frequency. There are only a few parameters to modify; for example, you can use one of three types of scales for the partials (logarithmic, linear, and harmonics). You can also determine the number of partials that will be used by specifying the maximum height (in pixel increments) for the image (see Fig. 6).

BeepMap has a Length parameter that will adjust the duration of your new sound from a fraction of a second to around 20 seconds (there are no values or increments shown for Length), and you can set the sound to loop or play back as a one-shot. Like other FL Studio generators, all of BeepMap's controls can be modified in real time. You can drag a new graphics file directly onto a button containing a BeepMap generator, even while a sequence is playing back, and integrate it into the current project. You can also send the output of a BeepMap generator to any FL Studio effect; a healthy amount of filtering and reverb can be effective, depending on the source image you're using.

U&I Software MetaSynth 4 (Mac, $499 [MSRP])

MetaSynth virtually defined the field of image-to-sound conversion on the modern desktop. It's long been the premier application in the field, and as it stands, it remains the most robust and versatile among this group. In addition to its graphics-related features, MetaSynth is a massive synthesis powerhouse, with tools for spectral analysis followed by multiple types of resynthesis, spectral morphing, sample processing, a variety of traditional synthesis techniques, effects, and much more. It even has a built-in timeline on which you can sequence sonic events.

Image placeholder title

FIG. 7: It''s easy to import an image file into MetaSynth and filter it, or you can draw a new image from scratch.

MetaSynth is noted for its unique, rather dark and affected interface. But once you get past its exterior, you'll find an intuitive and well-thought-out structure. The program's features are grouped into six main work areas called Rooms. For image-related work, the two of most interest are the Image Synth and Image Filter Rooms. The Image Synth Room is where you load an image (PICT format only) and determine how it will be used to generate a new sound. (You can also draw a new image from scratch; see Fig. 7 and Web Clip 7.) By default, the program will use additive synthesis, and you may find that many of your files initially produce similar-sounding results. But with only a few tweaks, you can switch to FM or granular synthesis, samples, or any arbitrary waveform you want, and that's just the first step of the conversion process.

MetaSynth lets you scale the duration of your new sound file up to a maximum of just under 13 minutes, and you can mix analyses of different source sounds. For example, you could analyze a graphic image, filter it with the spectrum of a vocal sound, then resynthesize it using a flute sample for each partial. The possibilities are limitless.

The Image Filter Room comes with dozens of preset filters, and it's simple to add your own. It's also easy to make pseudovocoder sounds or impose complex polyrhythmic patterns onto a static source sample. Though MetaSynth doesn't work entirely in real time, it does offer full-fidelity real-time previewing on most modern machines. Using a fast computer, you'll find it very comfortable to explore and experiment with the program's numerous processing functions.

Since its inception, MetaSynth has been a Mac-only program. If you're on a PC and really want to check it out, I suggest you do what I did: buy a Mac and give it a try.

VirSyn Poseidon 1.4 (Mac/Win, $279 [MSRP])

Like others in the VirSyn family, Poseidon has a somewhat unusual interface, offering the majority of its controls on only a single screen. The image conversion process, called Analyze Bitmap, is the counterpart to the sound-analysis command, Analyze Sound. Both run from the same menu, and both produce a 2-D (or 3-D if you prefer) spectral plot (see Fig. 8). Though you can't edit the analysis directly, you can determine the way it will be used to generate a new sound. For instance, you can loop through only a small segment of time, adjust the playback speed, modify the number of partials the new sound will contain (from 1 to 512), and determine whether it will include inharmonic or only harmonic partials. You can also map a number of these controls to MIDI data, which allows you to “perform” the modifications in real time.

Image placeholder title

FIG. 8: Poseidon lets you choose how you want to display the spectrum of the image you import. The 3-D waterfall view is shown here.

Poseidon, like Cameleon, doesn't create a new sound file directly from an image but uses the analysis data, regardless of its source, as the core of a synth patch. You can save the data in Poseidon's VRD file format, then freely substitute the spectral analysis that a preset patch uses for data generated by the image. So, for example, if you have a vocal patch that uses envelopes and other parameters characteristic of vocal sounds, you could substitute the analysis of an image for the underlying sound that the vocal patch uses, thereby applying the vocal characteristics to the image-generated sound.

Poseidon runs as both a standalone application and a VST plug-in, and like other VirSyn software, it includes only fairly basic documentation. You'll also need a Syncrosoft dongle just to try out the demo of the program. But like Cameleon, Poseidon is a very good-sounding professional soft synth. With dozens of sound-editing options, you should find more than enough ways to tweak your image conversions into very musical results regardless of what you start with.

Associate Editor Dennis Miller is a composer and animator. Check out his work

Izzy Gets Down

Mark Coniglio is the mastermind behind Isadora ($350), a real-time video- and audio-programming and performance environment that runs on both Mac and PC. Coniglio uses the software in his work as artistic codirector of the innovative New York-based dance company Troika Ranch ( and has spent many hundreds of hours refining the system for real-time use.

Like Reaktor and other modular programming environments, Isadora offers a large number of video and audio modules (called Actors) that you connect on the main work area. A basic configuration might consist of a QuickTime movie running via the MoviePlayer, outputting to a video-processing module (a rectangular tiler, for example), which is then connected to the Projector Actor for desktop viewing. You could also enable output to an external device and capture it to a DV video camera or project it onto a large screen.

Image placeholder title

FIG. A: This image shows the main work area in Mark Coniglio''s Isadora. The program''s Actors (modules) appear on the left and are connected on the right. This configuration uses Actors to track the frequency and amplitude of an incoming sound source and map them to parameters controlling the amount of displacement that is applied to the source movie file. The final output is shown at the bottom right of the screen.

But things can get a lot more interesting really quickly. For instance, using the Sound Frequency and Sound Level Watchers, you could map the amplitude level of a user-defined frequency band of an incoming audio signal to one or more parameters of a video effect or image generator (see Fig. A). That way, the loudness of the incoming audio could control the amount of displacement in a displacement module or determine the number of particles in a particle animation created in real time by the program.

There are dozens of modules for mixing, generating, and processing both images and audio, as well as support for the FreeFrame plug-in standard (, so you can add additional commercial or free video-processing plug-ins, too. You'll also find math and I/O modules, and you can build your own interfaces using elements such as sliders, knobs, and dials. The best part is that you can freely mix modules of nearly any type, using data extracted from an audio signal as the input to a video-effect parameter or vice versa.

Even if you only want to work with audio modules, you can design a wide range of networks in which elements interact in innovative and unusual ways. For example, using the Core version of the program (which adds support for AU plug-ins), you could track the frequency output of one audio file, then use that information to determine the amount of delay or perhaps the pan position (or both) of a second sound. The same information, perhaps scaled by some factor, could also control another audio-effect parameter and of course a video effect simultaneously.

Isadora is supported by a terrific manual (PDF only), an active users forum, and a number of getting-started tutorials. If you want to explore combining video and audio or are simply looking for unique ways to interconnect audio parameters, then give it a try. A free trial version, as well as educational pricing ($275) on the commercial version, makes it well worth a look.


Christopher Penrose''s HyperUpic system

The vOICE home page

The story of the ANS synthesizer

Sonification of images

IanniX download page