Masterclass: Using Game Audio Middleware

Tools For Designing Interactive Sounds

Understanding tools for designing interactive sounds

Web Bonus: Interested in digging deeper? Links to interactive learning materials, and a special exclusive offer for Electronic Musician readers, follow the main text.

WE ALL know that videogames are a huge business, which brings new opportunities for composers and sound designers. Maybe you’re a complete newcomer to sound for games, or perhaps you are already creating audio for traditional, linear media like film or televison; but if you’re considering taking the leap into game sound, first you need to understand the process and tools. In this feature, we’ll break down middleware—how it works and what options are available—but let’s start with the notion of designing audio for an interactive medium.

Imagine a linear medium, such as a film: If a character goes into the dark, spooky castle and opens the door at 13 minutes and 22 seconds, you can easily create or obtain a sound of a creaking door and place it on the timeline in your DAW at that exact point. Once you synchronize it with the film, it will always play at the same time.

Now imagine that each time you watch this film, the character goes into the house at a different time. This is a prime example of the unpredictability of the game environment. How could you successfully create sounds for a medium when you don’t know when a particular action is going to happen? You need to move away from using time as a basis for organizing sounds and concentrate on the actions themselves. Let’s think of our spooky house situation from the perspective of the action: At some point, the character is going to come up to the house and open the door. Let’s list it as an action, like this:

Action #001 Spooky Door Opening > Play ‘ spookydoor.wav ’

Now we’ve defined our action. How do we trigger it? In a movie, we may be out of luck, but fortunately in the game, something—most likely animation code—is going to cause the door to move. Hook up the code that triggers the door animation with the sound of a creaky door and voila! Instant sync—whenever the character opens the door, the sound will play.

This shift in thinking requires that each sound in a game exists as a unique element. We can’t use a mix of sound effects and music anymore, except in certain situations. Everything has to be mixed and mastered separately. Furthermore, we have to be really organized with all of these audio files (huge AAA adventure games can contain hundreds of thousands of files!) so that the programmer knows what to do with these assets in the game.

It also means that the way the audio is triggered in a game is intimately tied up with the way the game is designed, and each game is a complete universe unto itself: It has its own sets of rules and regulations and rules for emergent behavior and interactivity, and any change in terms of game design can significantly affect the way the sounds are triggered.

Middleware, such as Tazman Audio's Fabric, is a bridge between the game sound designer and the programmer. Meet the Middleman: Middleware A typical game engine has an awful lot of things to do. In essence, it has to run an entire virtual world, complete with animations, shading and rendering, visual effects, and of course, a lot of audio. It must coordinate the overall interactive logic of a game. It needs to know both the location of rich media assets (including music, sound effects, and voiceover files) as well as when (and when not) to call them up. Even in a small game, this can add up to a large amount of data that needs to be coordinated. In a large game the scope can extend into tens or hundreds of thousands of assets. Game engines are software packages that make games possible, and it takes a talented and dedicated group of designers to program the engine to do all this work efficiently.

Image placeholder title

Back in the day, a sound designer made noises and delivered those noises to a programmer in some file format or another, and that programmer put those files in the game in some way or another. Since this was a one-to-one relationship, it was usually a pretty efficient system.

As games got more complex and people were often far apart, it became standard practice to make those noises and deliver them to a programmer, maybe over the Internet, along with a text document with file names and instructions about where these sounds were supposed to go and what they were supposed to do. (This process is still in use today.) But, over time, software has emerged to handle an increasingly larger amount of the heavy lifting between the game engine and the programmer, and sound designer. This software is called middleware.

Audio middleware is a type of engine that works along with the core game engine and sits, in a sense, in between the sound designer and the programmer. Its main job is to allow the designer, who may not be a programmer, to have more control over how, when, and where his or her sounds are triggered in a game.

Let’s look at a car-racing game. A car has a motor and when the car goes faster, the sound of the motor changes in different ways—RPMs increase, gears shift, and the engine sound changes pitch and intensity. Let’s say the designer makes three sounds: car sound slow, car sound medium, and car sound fast. In the past, the programmer or integrator would take these three sounds and write lines of code that would trigger them, in real time, in the game as the car got faster. The programmer would need to take the time to program the sounds, adjust them, and make sure they worked correctly during gameplay, and while they were doing that, they could not be doing other programming tasks.

Now let’s look at how this same car sound could be developed using audio middleware. The biggest difference would be that this whole task can be accomplished by you, the mighty sound designer! Most audio middleware packages provide a graphical user interface, which looks similar to Pro Tools or other DAW applications; the sound designer would work in this interface and deliver a finished sound setup to the programmer. The audio designer and the programmer only have to know and agree on certain parameters, called hooks, that are used in the game.

Let’s get back to our three files: car fast, medium, and slow. The sound designer will create these sounds and import them into the middleware program of choice. Once they are in the program, the designer will attach necessary elements such as crossfades or map game parameters to the audio. In this example, it is quite common that the game engine will be keeping track of the speed of the car, probably in miles/km per hour, and the programmer will create a hook called car_speed inside their code and give this hook to the audio designer. As the car speeds up, this parameter value will increase.

First, however, we need to create what is often referred to as an event. Remember when we referred to being concerned about the action itself rather than the time it takes place? This is an extension of that concept embraced by middleware. Think of an event as a “container” for some kind of action or state in the game; an event can be tied to anything in the game, and it can consist of multiple parameters driven by the game’s logic.

Now that we have the hook of car speed, we can create a custom parameter in our middleware editor or GUI that is mapped to the different engine sounds we have. The value of the parameter will then control the timing and crossfades as well as the selection and balance of our low, medium, and high-speed car engine sounds. It may even control the speed/pitch of the sounds. It is then a relatively simple matter to export sound files, along with a text or configuration file, and deliver to the programmer. All the programmer has to do is tie the value of the parameter to the game’s event using lines of code, and voilà! The car’s visual speed will now match its perceived audio intensity.

This is a very generalized example that should give you an overview of how the middleware process works. The details surrounding the building of complex audio events tied to game states and parameters can get quite involved. The main point to take away here, however, is that middleware creates an all-round win-win situation: Sound designers have more control over their work, confident that the material they deliver is actually what they will hear in the game, and the programmer doesn’t have to spend as much time on the audio in the game.

Audio Middleware Structure Most middleware consists of a programmed codebase that is split into two major sections or areas. The most important to us is the GUI. This section, usually set up in a separate application, is primarily set up for the audio designer to import, configure, and test audio. Its main purpose is to allow the non-code-savvy composer or sound designer to become a sound implementer.

Once audio configuration and testing are complete, the implementer can then build one or more sound banks with configuration information that the programmer can use to integrate the audio system with the rest of the game code. Commonly these banks and files can then be placed inside the game by the implementers in order to test them out within the game environment. In some cases, the game itself can be hooked to the middleware in real time.

The Other Side of the Fence: The API Although the audio middleware system is constructed around a codebase, the game programmer will rarely deal with it at that level. It’s much more common to use an API, or Application Programming Interface. This is a set of instructions and script-based standards that allows access to the codebase without having to delve too deeply into the main code, which in some cases may not be accessible—in other words, it may be what’s called closed source. Web standards such as HTML5, CSS, and JQuery are referred to as open source, which means their code is public.

A middleware company, such as an audio tool development company, usually releases its API to the public so that other software developers can design products that are powered by its service. In some cases, the company lets people use it for free—for example, for students and individual users.

Firelight FMOD Studio offers sample-accurate audio triggering and a mixing system that allows buses, sends, and returns. Middleware Tools Let’s take a brief look at the commercial middleware products available today. These are used mainly by small to midsize independent game developers.

Image placeholder title

• Firelight FMOD Studio Firelight Technologies introduced FMOD in 2002 as a cross-platform audio runtime library for playing back sound for video games. Since its inception, FMOD branched into a low-level audio engine with an abstracted Event API, and a designer tool that set the standard for middleware editing/configuration reminiscent of DAWs. Firelight has since continued its innovation, releasing a brand-new audio engine in 2013 called FMOD Studio, which offers significant improvements over the older FMOD Ex engine, such as sample-accurate audio triggering, better file management, and an advanced audio mixing system that allows buses, sends, and returns.

Within the FMOD toolset, a sound designer/implementer can define the basic 3D/2D parameters for a sound or event, in addition to effectively mocking up complex parametric relationships between different sounds using intuitive crossfading and the ability to draw in automation curves and use effects and third-party plugins to change the audio. Music can be configured within FMOD Studio using tempo-based markers and timed triggers. Due to its flexible licensing pricing structure, FMOD is now a solid and widely adopted audio middleware choice and continues to be a major player in today’s game development environment.

Audiokinetic Wwise's Event System has become standard for many development houses worldwide. • Audiokinetic Wwise Introduced in 2006, the Wwise (Wave Works Interactive Sound Engine) toolset provides access to features of its engine from within a staggeringly comprehensive content-management UI. Its abstracted and modular Event system, which allows extremely complex and intricate results from simple operations that can be nested within each other, has become a standard for many development houses worldwide. Wwise can configure events via typical audio parameters, with finely controlled randomization of volume, pitch, surround placement, and effects, as well as logic, switch/state changes, attenuation profiles, and more. Its Interactive Music Engine enables the generation of unique and unpredictable soundtracks from a small amount of existing music material.

Image placeholder title

Profiling is also a strong feature in Wwise; its ability to mock up every aspect of the engine’s ability brings the toolset further into a full prototype simulation outside of the game engine.

Recently Audiokinetic announced a Mac-compatible public beta version of the Wwise Editor, plus upcoming MIDI integration of Wwise with support for downloadable sound banks of virtual instruments. Look for more developments on this front in the fall.

• Tazman Audio Fabric The Fabric toolset from Tazman Audio is another example of the changing dynamics of the game audio market. We’ve briefly mentioned the Unity3D game engine as a prominent force; although Unity is based around the FMOD engine and API, it offers very few features for sound designers to obtain middleware-like functionality without having to learn code in the process. Fabric was created to address this situation at a very high and sophisticated level.

Dark Tonic's Master Audio is an example of Unity-based middleware. With an event-based system of triggering and sounds, plus randomized and sequenced backgrounds, parameter-based game mixing, and a modular-based effect building system, Fabric is becoming more and more a tool of choice for developers desiring a more high-level integrated tool inside Unity. Recently, however, Tazman Audio signaled its intent to release the engine to work with a number of other game engines, which should be available sometime in 2014.

Image placeholder title

• Miles Sound System The Miles Sound System is one of the most popular pieces of middleware. It has been licensed for more than 5,000 games on 14 platforms. Miles is a sophisticated, robust, and fully featured sound system that has been around for quite some time—John Miles first released MSS in 1991 in the early days of PC gaming. Today, Miles features a toolset that integrates high-level sound authoring with 2D and 3D digital audio, featuring streaming, environmental reverb, multichannel mixing, and highly optimized audio decoders. Along with all the usual encoding and decoding options, Miles uses Bink audio compression for its sound banks. Bink Audio is close to MP3 and Ogg in compression size, but is said to use around 30 percent less CPU than those formats.

Other Unity-Based Middleware Along with Fabric, there has been a slew of audio middleware toolsets within Unity that are becoming increasingly available. Though perhaps not as sophisticated in certain ways as the larger middleware options, these toolsets provide a lot of variety and flexibility in helping non-programmer types to implement interactive audio behavior in games that don't require a lot of adaptability.

Chief among these is Master Audio from Dark Tonic Software. Master Audio features easily configurable drag-and-drop creation of sound events with instancing, probabilities, and randomized pitch and volume. Preset objects can be set up to trigger sounds on a variety of basic game conditions, like collision, trigger messages, etc. Notably, it also features grouping of sound events into buses (similar to Groups on a traditional DAW mixer window), as well as ducking of buses or sounds via other sounds or buses. This is convenient to create balanced mixes when lots of loud explosions happen, for example. The playlist section is also very versatile. Multiple playlists are supported and accurate crossfading between tracks of the same length is easily configured using the new Syncro feature. Other tools with roughly equivalent features are the Clockstone Audio Toolkit, SectrAudio, and SoundManager Pro. Documentation is relatively clear and concise as well.

This new field of Unity-based middleware is also producing innovations that some of the bigger-name developers don't even have yet. Sectr Audio provides volumetric and spline-based audio sources. (That's an audio source confined to a non-spherical space or curve.) Generative audio is provided by the new G-Audio toolset from Gregzo, as well as the more retro Sound Generator from DarkArts Studios, and much more. Do you want a plugin in your game that will bitcrush any audio source that is playing? You can get it here, as well. Look for this field to be expanding. It's an incredibly fertile place at the moment with lots of different approaches.

Last but most certainly not least, Unity updated audio mixing features in the upcoming Version 5, announced at the 2014 Game Developers Conference. Featuring in-game mixing, busing, ducking and automation of mix parameters, it's sure to be a welcome addition to game audio folks working in Unity when it hits (hopefully by fall 2014).

Take Control As we have seen here, middleware engines give today’s sound designer an amazing degree of control over the way audio behaves and develops within a game environment. The first audio middleware programs were developed in-house and were the proprietary property of the companies that built them for specific uses within specific titles. Over the years, third-party companies have come along to provide audio engines and code bases for all sorts of platforms that did not have access to them before. Along the way, the sophistication and power of these tools has significantly increased. Nowadays, middleware puts a lot of this power directly into our hands, so we can make sure that the sound we are hearing in the game is exactly what’s intended.

Probably the best news for all of you budding game audio experts is that most of the software discussed here can be downloaded for free, which makes exploring and bringing yourself up to speed on the latest middleware programs easy. Make no mistake about it, understanding audio middleware programs is a skill that all game audio professionals should have readily available in their bag of tricks.

This feature was excerpted from The Essential Guide to Game Audio by Steve Horowitz and Scott R. Looney, ©2014 Taylor & Francis Group. All Rights Reserved.

Interactive learning materials, including the free level outlined in this feature and a companion iOS app, are available at


For one month only, receive a 25% discount on anything on the site.

Offer is good through September 31, and good for single use only.

Use promo code 'gai-em-mag-25'

Steve Horowitz composed the soundtrack to the Academy Award-nominated film Super Size Me, has worked on hundreds of game titles, and engineered the Grammy-winning True Life Blues: The Songs of Bill Monroe. Scott Looney pioneered interactive online audio courses for the Academy of Art University, and has taught at Ex’pression College, Cogswell College, and Pyramind Training. He is currently researching procedural and generative sound applications in games, and mastering the art of code.