Phone It In!

Thanks to the audiocentric nature of cell phones, customizable ringtones have rapidly become one of the first commercially successful entertainment features
Publish date:
Social count:
Thanks to the audiocentric nature of cell phones, customizable ringtones have rapidly become one of the first commercially successful entertainment features

Thanks to the audiocentric nature of cell phones, customizable ringtones have rapidly become one of the first commercially successful entertainment features for mobile devices. From a practical standpoint, a personalized ringtone lets you identify your ringing phone when you're in a group of other cell-phone owners. And with a little preplanning, different ringtones can also be used to identify different callers. These benefits — along with the coolness factor of owning a custom ringtone — have fed the commercial success of the ringtone industry.

Customizable ringtones started to appear in 1998 and became a common feature in cell phones by early 2000. The early ringtone technology, however, supported only short monophonic melodies, so several companies and industry groups began developing new formats and technologies for playing back polyphonic ringtones and video-game audio on mobile devices. Polyphonic cell phones were starting to appear in Japan by 2000, and in Europe and North America slightly later.

There are currently many ringtone formats, and almost all are based on sequences rather than on actual audio recordings. That's because of the low 9.6 Kbps bit rate that is typically used for downloading data to phones. It takes less bandwidth (and is therefore less costly) for the end-user to download a small sequencer file that plays through a phone's built-in tone generator than it is to download an MP3 file of similar song length. (Some newer phone models are an exception; they support “True Tones,” which are actual MP3, WAV, ACC, or AMR format song recordings that function as ringtones.)

The industry is now moving from monophonic ringtone support toward polyphonic capability. Ringtone vendors seek songs in both polyphonic and monophonic versions, so composers should be familiar with several formats. The challenge of composing ringtones is in understanding the various formats and the authoring constraints required by mobile devices with limited CPUs, relatively low audio fidelity, and slow download speeds. (For more on acquiring ringtones, see the sidebar “Ringtone Retrieval.”)


Two monophonic formats have emerged as industry standards. RTTTL (Ring Tone Text Transfer Language), which Nokia adopted, was the first downloadable ringtone format. Another format, iMelody, established by the iRDA (Infrared Data Association), was adopted by Ericsson, Motorola, and Siemens, making it the first industry-standard cross-platform ringtone format.


RTTTL and RTX (an XML version of RTTTL) are text-based formats for describing monophonic melodies. The following example shows the Flintstones theme converted into RTX format:

Flintstone: d=4, o=5, b=200: g#, c#, 8p, c#6, 8a#, g#, c#, 8p, g#, 8f#, 8f, 8f, 8f#, 8g#, c#, d#, 2f, 2p, g#, c#, 8p, c#6, 8a#, g#, c#, 8p, g#, 8f#, 8f, 8f, 8f#, 8g#, c#, d#, 2c#

The RTX format has three sections. The title contains the ten-character title “Flintstone”. The head section defines default values for duration (d), octave (o), and tempo (b). Any note in the sequence section that doesn't specify a duration or octave inherits these values, helping to reduce the number of characters in the sequence and the resulting file size of the ringtone.

Each event in the sequence section consists of a duration, pitch name, accidental (#), and octave, in that order. For example, the fourth event c#6 means a quarter note (the default subdivision) C-sharp in octave 6. “P” is used to represent a rest. Rhythmic durations range from 1 (a whole note) to 32 (a 32nd note). You can learn more about the RTX specification at


The iMelody format is also text based and offers additional features, such as individual note volumes, a wider range of rhythmic subdivisions, a wider range of octaves, and sharp/flat accidentals. Here's an example of a theme by Mozart written in iMelody format:

NAME: Melody1
BEAT: 120
MELODY: &b2#c3V-c2*4g3d3V+#d1r3d2e2:d1V+f2f3.

The melody section contains a sequence defining an event's volume, octave, accidental, note name, and duration in that order. Notes without specific octave or volume values inherit the default values, helping to reduce file size. Rhythmic values range from 0 (whole note) to 5 (32nd note). The following is the sequence rewritten with each event separated by a vertical line.

&b2 | #c3 | V-c2 | *4g3 | d3 | V+#d1 | r3 | d2 | e2: | d1 | V+f2 | f3

The first event plays a B-flat (the ampersand designates a flat) quarter note. The fourth event plays a G eighth note in octave 4; the asterisk (*) designates a new octave. In the third event, V- means decrease the volume by one value; the V+ in the 11th event increases the volume by one value. For more information on the iMelody format visit the iRDA Web site (


By now you are probably asking, “Do I really have to compose music in this cryptic format?” Fortunately, the answer is no. There are several applications that convert MIDI into different monophonic ringtone formats. However, by understanding these formats, you can manually edit the ringtone to add or edit data that was not included during conversion.

When you prepare monophonic ringtones, a vendor or network operator should provide the maximum character length of each ringtone format's sequence. You must optimize the sequence for each ringtone format according to the limits provided by the vendor or network operator.


The ringtone industry is now moving away from the cryptic and limiting monophonic ringtones toward MIDI-based polyphonic ringtones. Several companies, such as Beatnik, Faith, Tao, and Yamaha, license sophisticated synthesis technology to leading phone manufacturers. In addition to supporting MIDI playback through a General MIDI (GM) bank set, some newer phones also support “structured audio” formats that include both MIDI sequence data and custom sounds.


MIDI is the most widely supported polyphonic ringtone format. In most cases, phone manufacturers use a software synthesizer to play back MIDI files; each phone has a different CPU capacity and therefore a different level of polyphony. This presents a problem for the composer who is left wondering, “What happens to my ringtone if a given phone doesn't have enough polyphony to play it?”

The SP-MIDI (Scalable Polyphony MIDI) specification solves this problem by enabling the composer to create a single version of a song and set up rules so that a phone supporting 4-note polyphony, for example, could play up to four preselected musical parts from a 16-part song, while a more sophisticated phone might play all of the parts. The scalable aspect of SP-MIDI enables a composer to create a song that plays in a predictable way in a variety of polyphony-limited situations.

The MMA ratified the SP-MIDI specification in May 2002, and SP-MIDI-compatible phones appeared on the market in mid-2002. Currently, SP-MIDI is the most common polyphonic ringtone format for Europe. Polyphony support in current phones ranges from 4 to 24 notes.

Creating effective SP-MIDI compositions requires careful voice management. Overlapping notes, sustain-pedal controller data, and sounds with slow releases can unintentionally drain a channel's available polyphony. In that case, a phone without sufficient polyphony would resort to note stealing, causing unpredictable note dropouts during playback. You must therefore analyze a composition carefully to avoid any hidden polyphony. For more detailed information, the complete SP-MIDI specification is available from the MIDI Manufacturers Association (

To create an SP-MIDI-compliant MIDI file, you can use an SP-MIDI-authoring application. One application for authoring SP-MIDI is Beatnik's Mobile Sound Builder (see Fig. 1), which can test a MIDI file under different polyphony limitations and audition the music with sounds similar to those in Nokia, Sony Ericsson, Siemens, Motorola, Samsung, and Danger phones. All of these manufacturers license their software synthesizer and sound banks from Beatnik Inc. The Mobile Sound Builder is available from the Beatnik Web site (


Another MMA standard ratified in November 2001, XMF (Extensible Music Format) is a structured audio format combining both MIDI sequence data and custom wavetable sounds in DLS (Downloadable Sound) format. XMF supports encryption (to protect MIDI data), wavetable sample data, and copyright information. It also compresses the DLS bank by 25 to 50 percent, aiding in file-size reduction. XMF can play its own custom DLS sounds and GM sounds from the host synthesizer.

Because XMF is a nonproprietary format adopted by the MMA, it's likely to become another important mobile-audio format. To create XMF, you need a DLS editor for custom sound creation and software to merge MIDI and DLS. Currently, the only available XMF tool is Beatnik's Mobile Sound Builder.

Beatnik RMF

Beatnik's RMF (Rich Music Format) is a proprietary format that has many of the same features as XMF. In addition, RMF supports ADPCM 4:1 compression and MP3 compression of wavetable samples. However, only phones that support the Beatnik Audio Engine can play this format. Creating RMF files requires the Beatnik Editor, another commercial tool available from the Beatnik Web site. Phones supporting RMF and XMF became available in late 2003.

TAO Intent

The Tao Group licenses its Tao Intent software to a number of mobile device manufacturers. Unlike the other technologies described in this article, the Tao Intent platform marries a sophisticated audio engine with a complete multimedia system.

The Tao Intent Sound System (ISS) supports many standard formats including MIDI, SP-MIDI, and the new SKM format. This open format allows merging of MIDI and custom sampled sounds, together with vector audio.

Tao's vector audio is a text-based format that includes parameters for controlling software synthesizers, effects units, and music engines, such as the Koan generative music engine, allowing the generation of music or sound in real time. Tao is currently marketing the technology to deliver dynamic ringtones. These “live tones” sound musically and sonically different each time they play, allowing for subtle variety and a bit of novelty, without producing large files.

The Tao ISS content authoring tools are available from SSEYO (, which is a subsidiary of Tao.

Faith MFi

Japan-based Faith's audio engine is the basis for several important ringtone formats. Faith developed a subformat of MIDI called compact MIDI (cMIDI), which reduces the range of allowed MIDI data, thereby decreasing a ringtone's file size.

In 1999, Faith proposed and developed MFi (Melody Format for i-mode), the first widespread polyphonic ringtone format, for the NTT DoCoMo i-mode network (currently the largest phone network in the world, with nearly 40 million subscribers). All phones that are branded for i-mode service can support the MFi format, which contains cMIDI and custom samples. The MFi format and related authoring software are owned by NTT DoCoMo, and information about its capabilities is available only to content providers licensed by NTT DoCoMo.

In 2000, Faith and Qualcomm jointly developed the CMX (Compact Media Extensions) format for Qualcomm chip sets. CMX, a multimedia format for synchronized sound samples, MIDI, graphics, and text, can be used for phone ringtones, screen savers, or messaging. Authoring tools for CMX are available from Qualcomm's CMX division ( CMX-capable phones are available in Japan and the United States and should become available for Europe in the near future.

Yamaha SMAF

SMAF (Synthetic Music Mobile Application Format) is an advanced structured audio format that is the most common polyphonic ringtone format in East Asia, with a growing market in Europe and the United States. The format combines FM synthesis data, MIDI, samples, wavetable synthesis data, and other media, such as graphics, into one convenient format.

Currently, Yamaha manufactures four SMAF-compatible chips. When developing SMAF content, it's important that you understand the capabilities of the chip in your target audience's phone. The MA-1 chip has a maximum polyphony of four notes and plays SMAF through a GM-compatible FM synthesizer that also supports custom FM sounds with two oscillators. SMAF files for MA-2 chips can have 16-note polyphony and play built-in FM sounds from the GM bank and a second bank of 128 sounds. MA-2-compatible SMAF files support custom FM sounds with up to four oscillators, and they can include WAV samples with 4:1 compression. The MA-3 chip includes all of the features of the MA-2 chip. It boasts 40-note polyphony and custom sounds using FM or wavetable synthesis. Yamaha's most recent and most sophisticated chip, the MA-5, adds support for analog synthesis and human-voice synthesis.

A sophisticated feature of SMAF MA-2/3/5 chips is support for compressed audio samples and FM tones. FM synthesis is highly efficient for mobile devices, because a 4-oscillator FM patch occupies 30 bytes of memory. Consequently, you can add multiple custom FM sounds without a significant increase in file size. You can also include vocal or drum-loop samples with 4:1 compression to minimize ringtone file size.

Yamaha provides several free authoring applications from its Yamaha SMAF Global Web site along with a commercial professional-level hardware authoring system. The free tools support the conversion of type 0 Standard MIDI Files to SMAF files, the creation of custom FM sounds, and the inclusion of WAV samples (see Fig. 2).

The commercial hardware tool (model MMFMA3ASE) is a tone module with a software front end. The hardware provides all the necessary ins and outs: connection to a handset speaker for monitoring, LED indication, and line-out. The software has tools for voice editing, voice management, and MIDI-to-SMAF conversion with WAV-sample inclusion and file/data size indicators. You can learn more about creating SMAF content from the Yamaha SMAF Global Web site (

Sonic Network EAS

Sonic Network, Inc. (maker of Sonic Implants sound libraries) has gone mobile with its Embedded Audio Synthesis (EAS) technology. The system supports digital audio content, DLS files, and MIDI and provides on-demand interactive audio playback for mobile phones. The company's customizable EAS technology consists of a digital audio player, GM synthesizer with wavetable soundsets, and multimedia extensions for several ringtone formats, including GM, SP-MIDI, SMAF-MA2, and CMX.

Sonic Network also offers GrooveFone, an interactive musical program that lets you remix songs using the buttons on your cell phone. With GrooveFone you can change drum beats, bass parts, harmonies, and melodies, and then save your song as a customized ringtone. You can find out more at


When adding custom audio samples to ringtones, be sure to minimize the file sizes. Many network operators have a limit of 10 to 60 KB for ringtones. With careful downsampling, waveform editing, and compression, it's possible to improve the quality of a ringtone with one or two custom samples while remaining within this limit. (For more information on ringtone technical limitations, see the sidebar “Ringtone Restrictions.”)

There are also many aesthetic challenges for composing effective ringtones. For example, it's important to choose sounds carefully, because not all sounds project well from inside a phone owner's purse or coat pocket. Another problem is that instrumental arrangements of pop songs can sound like canned background music, leaving the phone owner to wonder, “Is that my phone ringing, or is that background music playing in the store?”

The best way to avoid these problems is to test your ringtones through an actual phone in a variety of locations and situations. Many of the ringtone formats also support vibration and LED events, which are additional forms of alert. Including those events in your ringtone can work around some ringtone-audibility problems.

Hayden Porteris a Web developer and musician specializing in sound for new media. He is also the editor of, focusing on Web and wireless audio.


You can audition a ringtone in a phone by manually entering monophonic song data using the phone's built-in ringtone composer or by transferring the ringtone file from a computer to your phone through an infrared light beam or through a USB or serial data cable.

You can send ringtones to other phones by attaching them to a message and sending the message to the other phone. You can also download ringtones into your phone from a Web site that is accessible to your phone's WAP (Wireless Application Protocol) browser.


When creating cell-phone ringtones, it's important to keep several technical limitations in mind to ensure the best possible results.

9.6 Kbps download bit rate

Cell phones use a much slower bit rate than the typical dial-up Internet connection. Consequently, ringtones should be as small as possible to facilitate rapid delivery to end-users.

10 to 60 KB network limit on ringtone file size

Phone network operators often limit the size of ringtone files to 10 to 60 KB — sometimes even less. Also, phones have a limited amount of storage space and may have a limit on the file size of a stored ringtone.

8 and 11 kHz sampling-rate output

Phones have limited digital-to-analog converter capability because of CPU constraints. Be sure to downsample all sound files to these sampling rates for maximum delivery and playback efficiency.

300 to 3,000 Hz phone-speaker frequency range

Cell-phone speakers are quite small and have a limited bass response. To emulate the playback of a small piezo speaker, try making a recording of the ringtone using EQ to remove the frequencies outside this range.