To ensure your recordings are the best possible, you need highly accurate reference monitors—or do you? Some producers swear by speakers with well-known faults, while others use only the most pristine monitors. We resolve the conflicting viewpoints, discuss the vagaries of studio-monitor design, explain the most common specifications, and walk you through the process of choosing close-field monitors.
Image placeholder title

Judgingby the steady flow of letters and phone calls we get asking our adviceabout what gear to buy, a good number of readers are well acquaintedwith cognitive overload. That's the term psychologists use to describethe paralysis that can set in when we are confronted by too manyoptions (or too much information). Freedom of choice is great, butclearly, too many options can bewilder. Case in point: the EM2001 Personal Studio Buyer's Guide lists 40 companies presentlyoffering reference monitors, with more than 200 models to choosefrom.

Image placeholder title

Bewildered? If so, you've come to the right place. This article willcover the various designs, components, and properties (includingterminology) of reference monitors, as well as how they work — inshort, all you need to know to make informed decisions when selectingclose-field reference monitors for your personal studio. (Though manyof the concepts discussed here apply equally well to monitors forsurround arrays, those interested specifically in monitoring for 5.1should also see “You're Surrounded” in the October 2000EM.)


Speakers used in recording studios are called monitors and generallyfall into two categories: main monitors and compact or close-fieldreference monitors. Mains, as they are called, are mostly found inthe control rooms of large commercial studios, often flush-mounted in a“false” wall (called a soffit); close-fieldreference monitors are freestanding and usually sit atop the consolebridge or on stands directly behind the console.

Most personal studios don't have the space or funds for mainmonitors, so this article will focus on the compact reference monitor— a relatively recent studio tool. The first“compact” monitor to see widespread use in recordingstudios was the JBL 4311, a 3-way design introduced in the late 1960s.The 4311 was quite large, however (it had a 12-inch woofer, a 5-inchmidrange speaker, and a 1.4-inch tweeter), and today would qualify moreas a mid-field monitor.

As engineers increasingly realized the importance of hearing howtheir mixes sounded on car and television speakers, smaller referencemonitors gained in popularity. One of the earliest favorites (aroundthe mid-1970s) was the Auratone “cube,” which had a single5-inch speaker.

Car and home-stereo speakers kept improving, of course, so engineerswere always on the lookout for better close-fields. One compact modelthat caught on big was the Yamaha NS-10M (see Fig. 1). Abookshelf-type speaker introduced in 1978 for home use, the NS-10M soonbecame a familiar sight in commercial studios, and it remains popular— or at least ubiquitous — to this day.

Another significant development was the introduction in 1977 of theMDM-4 near-field monitor, made by audio pioneer Ed Long's company,Calibration Standard Instruments. The MDM-4s were great monitors, butit was the then-revolutionary concept of near-field monitoring thatsecured a chapter in audio history for Long. (Long also originated theconcept of time alignment for speakers and trademarked the term“Time Align”; more on this later.) Though no one could havepredicted how prophetic the term near-field monitor wouldprove, Long clearly understood its significance and so had ittrademarked. (That is why EM uses the term close-fieldmonitor instead).


Curiously, because close-field reference monitors have becomeincreasingly accurate during the course of time, the original rationalefor using them — to generate a good indication of how mixes willtranslate to low-cost car and home-stereo speakers — has waned.But there are also other good reasons close-field monitors have becomeall but indispensable in music production. For one, professional mixengineers are typically hired on a project-by-project basis, whichmeans they may end up in a different studio from one day to the next.Close-field monitors, because they are portable enough to be cartedfrom studio to studio, make for an ideal solution and guarantee, at theminimum, some level of sonic consistency, regardless of the room.

But don't the monitors sound different in different rooms? To adegree, they do. But another advantage of close-field monitors is thatthey can partially mitigate the effect of the room on what you hear. Astheir name makes clear, they are meant to be used in the “nearfield,” typically about three feet from the engineer's ears. Atthat distance, assuming the monitors are well positioned and usedcorrectly, the sound can pass to the ears largely unaffected by surfacereflections (from the walls, ceiling, console, and so forth) and thevarious sonic ills they can wreak.

For the same reason, close-field monitoring is also a good solutionfor the personal studio, where sonic anomalies are the norm. Asengineer, consultant, and all-around acoustics wizard Bob Hodas has sowell demonstrated, however, it's foolhardy to think close-fieldmonitors entirely spare you from the effects of room acoustics.“near-field monitors can be accurate,” explains Hodas,“only if care is taken in the placement of the speakers and roomissues are not ignored.” (Find more information at


A common misconception among those new to music production is thathome-stereo speakers are adequate for monitoring. That is, in fact, notthe case. The problem is one of purpose: whereas manufacturers designreference monitors to reproduce signals accurately, home-stereospeakers are specifically designed to make recordings sound“better.” Typically, that perceived improvement isaccomplished by boosting low and high frequencies. Although it maysound like an enhancement to the average listener, such“hype” is really a move away from accuracy.

Home-stereo speakers may also be engineered to de-emphasize midrangefrequencies so as to mask problems in this critical range. That makesit difficult to hear what's going on in the midrange, which can temptmixers to overcompensate with EQ. It can also lead to fatigue becausethe ear must strain to hear the mids.

Yet another reason home-stereo speakers are inappropriate formonitoring is that they are meant to be listened to in the far field,where much of the sound is reflected. But as we've seen, close-fieldmonitors are designed to be used in the near field, in order to helpminimize the effects of room acoustics. Of course, it's important notto sit too close to near fields. Rather, they should be positioned farenough back to allow the sound from the speakers to blend into anapparent point source and stereo soundstage. As you move in closer thanthree feet or so, the sound from each speaker becomes distinguishableseparately, which is not what you want.


Everyone can agree that reference monitors are meant to reproducesignals accurately. But what is accuracy? For our purposes, there arethree objective tests that can be performed to help quantify accuracyin reference monitors. The tests measure frequency response,transient or impulse response, and lastly,distortion.

Frequency response is a measure of the changes in output level thatoccur as a monitor is fed a full spectrum of constant-level inputfrequencies. The output levels can be plotted as a line on a graph— called a frequency response plot — in relationto a nominal level represented as a median line typically marked 0 dB(see Fig. 2). The monitor is said to have a “flat”or linear frequency response when that line correspondsclosely to the median line — that is, does not fluctuate muchabove or below from one frequency to the next.

When they are written out, frequency-response specifications firstdesignate a frequency range, which is typically somewhere between 40and 60 Hz on the low end and 18 to 22 kHz on the high end. To completethe specification, the frequency range is followed by a rangespecifier, which is a plus/minus figure indicating, in decibels, therange of output fluctuation. For example, the spec “50 Hz —20 kHz (±1 dB)” means that frequencies produced by themonitor between 50 Hz and 20 kHz will vary no more than 1 dB up or down(louder or quieter) from the input signal. (That spec would suggest avery flat monitor, by the way!) Note that the range specifier may alsobe expressed as two numbers, for example “+1/-2 dB,” whichis useful when the response varies more one direction than theother.

Primary frequency-response measurements are made on-axis, that is,with the test mic directly facing the monitor, often at a distance ofone meter. Also helpful are off-axis frequency response plots (measuredwith the mic at a 30-degree angle to the monitor, for example), whichgive an indication of how accurate the response will be — or howmuch it might change — as you reach for controls or gear locatedoutside of the “sweet spot.” (The sweet spot is the idealposition to sit at in relation to the monitors; it is calculated bydistance, angle, and listening.)

Transient or impulse response is a measure of the speaker's abilityto reproduce the fast rise of a transient and the time it takes for thespeaker to settle or stop moving after reproduction of the transient.Obviously, the first characteristic is critical to accuratereproduction of instrument dynamics and transients (such as the attackof a drum hit or a string pluck). The second is important because aspeaker that is still in motion from a previous waveform will mask thefollowing waveform and thus muddle the sound (see Fig. 3).

Distortion refers to undesirable components of a signal, which is tosay, anything added to the signal that was not there in the firstplace. For monitors it can be divided into two categories: harmonicdistortion and intermodulation distortion (IM). Harmonic distortion isany distortion related in some way to the original input signal. Itincludes second- and third-harmonic distortion, total harmonicdistortion (THD), and noise (which are the types most commonlymeasured; see Fig. 4), as well as higher harmonic distortions(fifth, seventh, ninth, and so on). Intermodulation distortion is aform of “self-noise” that is generated by the speakersystem in response to being excited by a dynamic, multifrequencysignal; typically, it is more audible and more annoying than harmonicdistortion.

Frequency response, impulse response, and distortion levels shouldall be taken into account to get an idea of a monitor's accuracy.However, frequency response is often the only measure mentioned inproduct literature and reviews, and even it gets short shrift onoccasion. (In many instances, I have seen frequency specs given with norange specifier — and of course, without it the specification ismeaningless). Few manufacturers provide an impulse response graph (evenassuming they have measured impulse response), and often the onlydistortion specification given is “THD + noise.” In fact,the lack of established and agreed-upon standards for monitor (and formicrophone) specifications — for both measuring them andreporting them — is a long-standing industry issue. Though it istrue that specs don't tell the entire story, they are useful forcorroborating what our ears tell us, and as such they can help educateus so that we can more exactingly listen.


Now that we've established the raison d'être of theclose-field monitor, let's take a look at its anatomy. We'll start withthe internal components and work our way outward to the enclosure.Understanding how monitors are put together will help you know what tolook for when deciding which best suit your needs.

Interestingly, the devices on either end of the recording signalchain — microphones and monitors — are very similar. Bothare types of transducers, or devices that transform energyfrom one form into another. The difference is in the direction ofenergy flow: microphones convert sound waves into electrical signalsand speakers convert electrical signals into sound waves. However, thecomponents and operating principles of monitors and mics areessentially the same.

The speakers most commonly used in close-field monitors work in thesame way as moving-coil dynamic microphones do, only in reverse.(Actually, there is a correlative speaker for other types ofmicrophones as well, including ribbons and condensers. However, we willlimit the discussion to the moving-coil type in this article.) In amoving-coil dynamic microphone, a thin, circular diaphragm is attachedto a fine coil of wire positioned inside a gap in a permanent magnet.Sound waves move the diaphragm back and forth, causing the attachedcoil to move in its north/south magnetic field, thus generating a tinyelectric current within the coil of wire.

In a loudspeaker, the coil of wire is known as the voice coil. Asthe electric current (audio signal) fluctuates in the wire, itgenerates an oscillating magnetic field that pushes and pulls againstthe magnet, causing the voice coil and attached diaphragm (in thiscase, the speaker cone; see Fig. 5) to vibrate. In turn, thevibrating speaker cone agitates nearby air molecules, creating thesound waves that reach our ears. (The ear, by the way, is also atransducer. It has a diaphragm — the timpanic membrane or eardrum— that converts acoustic sound waves into tiny electrochemicalimpulses which the brain then interprets as sound.)


A loudspeaker's magnet, voice coil, and diaphragm form,collectively, an assembly called a driver. (The moving-coildriver is the most common type, but there are other kinds as well.)Close-field monitors usually contain either two or three drivers, andthus are designated 2-way or 3-way, respectively.Standard 2-way monitors contain a woofer and tweeter; standard 3-wayscontain a woofer, a tweeter, and a midrange driver. The woofer, ofcourse, reproduces lower frequencies and the tweeter, the higherfrequencies.

Cones and domes are the two most common types ofdiaphragms used in monitor drivers. Woofers and most midrange driversemploy cone diaphragms, typically made of treated paper, polypropylene,or more exotic materials such as Kevlar. (Note that the dome-shapedpiece in the center of a woofer cone is a dust cap, not a dome.) Mostmoving-coil tweeters use a small dome, typically measuring one inch indiameter. One advantage of a small dome is that it exhibits fasttransient response and a wide dispersion pattern, both of which arecritical to the reproduction of upper frequencies. Domes are routinelymade of treated paper too, but may also be made from a metal such asaluminum or titanium, or sometimes from stiffened silk, which somepeople believe sounds less harsh than metal.

When monitors employ separate drivers, as 2-way and 3-way monitorsdo, the design is termed discrete. In discrete designs, thedrivers are usually mounted on the front face of the enclosure as closetogether as possible, which helps the sound blend into a coherent pointsource at the sweet spot. Depending on the monitors, the sound canchange dramatically as you move away from the sweet spot.


Some companies, for example Tannoy, employ an alternative driverdesign in some of their monitors in which the tweeter is mounted in thecenter of the woofer cone (see Fig. 6). Though more expensive,this coaxial design is naturally more time coherent thandiscrete designs because the drivers are positioned on the same axis(as well as closer together). Indeed, the coaxial driver arrangement isone of the design elements (among others) that manufacturers have usedto meet Ed Long's Time Align specification, mentioned before.

Before we can understand how time alignment can improve a monitor'saccuracy, we must first understand the timing problems inherent inconventional monitor designs. Discrete loudspeakers cause minute delaysthat spread sounds out in time, resulting in lost detail and a blurredor smeared sound. Specifically, sound from the woofer is delayed morethan sound from the tweeter. This problem has two main sources, onestructural, the other electronic. In a discrete monitor with aflat-face enclosure, the woofer voice coil is naturally set backfurther than the tweeter voice coil because of the extra depth of thecone in relation to the dome. The tweeter is therefore closer to yourears, causing the high frequencies to arrive slightly ahead of thelows.

The problem is compounded by the crossover, an electroniccircuit that splits the incoming signal into separate frequency bandsand directs each band to the appropriate driver (more on crossoversmomentarily). As it happens, crossovers also tend to delay lowfrequencies more than highs.

With his Time Align scheme, Long was the first to specifycorrections for these problems, including physically lining up thedrivers and adjusting driver and crossover delay parameters. Whencorrectly implemented, Time Alignment ensures that the timerelationships of the fundamentals and overtones of sounds are the samewhen they reach the listener as they were in the electrical signal atthe input terminals of the monitor.

Over the years, some manufacturers have devised their owntime-alignment schemes. You may recall, for example, thenow-discontinued JBL 4200 series monitors, which employed protrudingwoofers designed to deliver low frequencies to the listener's earssimultaneous with highs from the tweeters.


As mentioned, the crossover's job is to divide the incoming signalinto separate bands and then send each band to the appropriate driver.In inexpensive monitors, this is typically accomplished using simplelowpass and highpass filters that split the signal coming from thepower amp. This is called a passive crossover. In moresophisticated systems, an active crossover splits theline-level signal before it gets to the power amp. Thisrequires each driver to have its own power amp, and is called biampingin 2-way monitor, triamping in a 3-way, and so on.

Typically, monitors that have active crossovers incorporate internalpower amps. These are called powered monitors. The termsactive and powered, though often usedinterchangeably, actually refer to different things: active refers tothe crossover, and powered to the fact that the amplifiers are part ofthe package. In other words, although active monitors are almost alwayspowered, not all powered monitors are active. For example, EventElectronics at one time offered three versions of its popular 20/20monitors: the straight 20/20 was unpowered and had a passive crossover;the 20/20p was powered but used a passive crossover; and the 20/20bas(biamplified system) was both powered and active.

In addition to giving a more exacting crossover performance,powered, active monitors offer other advantages over passive designs.Perhaps most importantly, because the amps and electronics arespecifically designed to match the drivers and enclosure, poweredmonitors eliminate the guesswork and the potential pitfalls of matchingan external amp to your monitors. (For a discussion of matching poweramps to passive monitors, see the sidebar “A Good Match.”)This means reduced risk of blowing the drivers and virtually no risk ofovertaxing the amps. In addition, the internal wiring is much shorter,which cuts down on frequency loss, noise induction, and other gremlinsattributable to long cable runs. The upshot is that a power, activesystem provides a more reliable reference — no matter where youtake the monitors, you can be sure the only variable is roomacoustics.


The enclosure is a critical part of any reference monitor design.Compact monitors present a particular challenge to designers becausediminutive enclosures do not support low frequencies well. For manysmall monitors, the lowest practical frequency is around 60 Hz.However, certain techniques allow manufacturers to extend thelow-frequency response of their boxes.

A common solution is to vent or port the enclosure (see Fig.6). The concept of porting is quite complex, involving not onlyone or two visible holes, but also other acoustic-design constructionsinside the cabinet. In this design, often termed a bass reflexsystem, the port helps “tune” the enclosure to resonate atfrequencies lower than the woofer's natural rolloff. That is, as thefrequencies drop below the monitor's lowest practical note, theenclosure begins to resonate at yet lower frequencies, essentiallyproviding a bass “boost.” Although porting can extend thelow-frequency response of the monitor well below a similarly sized butcompletely sealed enclosure (called an infinite baffle oracoustic suspension design), some people feel that theresulting bass extension is not a trustworthy reflection of what isreally going on in the low end. (One noteworthy solution here is theincorporation of a subwoofer.)

Ports tend to be round, ovular, or slit-shaped, and usually arelocated on either the front or rear panel of compact monitors. Rearports allow for a smaller front face, and therefore a more compactmonitor, but they can also lead to sonic imbalances — the mainone being excessive bass — in cases where the monitor is mountedtoo close to a wall or corner. Front ports help avoid this problem, butrequire a larger front face on the enclosure.

Another problem with front ports is that they can reduce thestructural integrity of the front baffle (which is already weakened byat least two large holes, one each for the woofer and tweeter). Someported monitors provide port plugs, which can be helpful for reducinglow-frequency output in case you are forced to mount the monitor near awall or corner. (A different solution for this problem is increasinglyfound in powered/active monitors — “contour” switchesthat let you adjust the monitor's low- and high-frequency output tocompensate for acoustical imbalances in the listening space.)

Nowadays, most manufacturers build their enclosures frommedium-density fiberboard (MDF), a material that offers betterconsistency and lower cost than wood. Grille cloths may or may not beprovided with the monitors; but these are a cosmetic enhancement atbest, and traditionally are removed for monitoring.

Because an enclosure's front baffle shapes the sound as it leavesthe drivers, all aspects of the baffle must be taken into account bythe designers. For this reason, designers often round off corners andsharp edges, and the face of the enclosure is kept as smooth and spareas possible in order to minimize interferences like diffraction(breaking up of sound waves). One critical acoustic-design feature onthe front baffle is the wave guide — a shallow,contoured “cup” surrounding the tweeter. The structure andthe shape of the wave guide both affect high-frequency dispersion,which in turn affects other sound qualities such as imaging (seeFig. 7).


Now that we've laid the groundwork, let's tally up what constitutesa superior monitor. Specifically, what do you hear in better monitorsthat you don't hear in lower-quality ones?

We already know one answer: accuracy. More than anything, thepurpose and goal of a reference monitor is to transduce signalsaccurately. Monitoring is the last step in a long journey through thevarious processes required to get your music to its destination.Therefore reference monitors are your ultimate “feedback”system and the basis of all of the decisions you make about how toshape and process a mix.

As we've seen, the technical recipe for accuracy has three basicingredients: accurate frequency response, accurate impulse response,and low distortion. Superior monitors boast a very flat frequencyresponse, typically within ±3 dB of a nominal level. In addition,the frequency response should roll off smoothly at either end of thespectrum, as well as fall off evenly as you move away or off axis fromthe monitor.

Also critical is a monitor's impulse response. Ideally, this shouldbe a direct analog to changes in air pressure in response to transientelectrical signals; a superior monitor keeps all the “timedomain” qualities of a signal intact, reproducing them in exactlythe same time relation as they appear at the monitor's input terminals.In addition, in a superior monitor the frequencies issuing fromdiscrete drivers are time aligned so as to compensate for the timemisalignment inherent in discrete designs, as described earlier. Thatway, the highs, mids, and lows reach the listener's earsimultaneously.

Both impulse response and time alignment (among other things) figureprominently into two other critical sonic qualities of a referencemonitor: soundstage and imaging. Soundstage refers tothe imaginary stage that forms between two speakers (including widthand depth), and imaging refers to how well the monitors can localizeindividual instruments on the soundstage. Obviously, a good soundstageand precise imaging are necessary for accurate positioning ofinstruments within the stereo field.

Distortion levels vary considerably from system to system. Whereashome-stereo speakers typically exhibit as much as 1 percent distortionabove bass frequencies, some high-quality reference monitors maydeliver as little as 0.1 percent. Though a low distortion spec isalways desirable, some monitors with less-than-spectacular distortionspecs still excel thanks to superiority by other measures. The humanear, however, is very sensitive to distortion, especially in themidrange (distortion is often a major contributor to ear fatigue).

Another helpful specification is speaker sensitivity orefficiency, which shows the monitor's output sound pressurelevel (in dB SPL) at a distance of 1 meter with an input signal of 1W.All things being equal (which they rarely are), speaker sensitivity hasno determining effect on sound quality. However, if you are doing anA/B comparison of two or more sets of passive monitors and running themfrom the same power amp through a switching box, it is important to beaware of differing sensitivities. Our ears can readily perceive evenslight differences in SPL, and our brains naturally perceive loudersources as sounding better. If you fail to compensate for anysensitivity differences — that is, to ensure that each monitor isplaying back at the same level — you are more prone to reachincorrect assessments of monitors while comparing them.


Accuracy is important because, ostensibly at least, it guaranteesthat what we hear from our monitors is the “audio truth.”Unfortunately, though, objective measures don't really guaranteeaccuracy. As helpful as specs may be, they are not really an indicatorof how a monitor sounds; two similar monitors with near-identical specscan sound very different, for example. Therefore, as in all thingsaudio, careful listening must be the final measure. After all,monitoring is inherently subjective.

But even if monitoring weren't subjective and reliable standards foraccuracy could be decided on and agreed upon, the problem ofwide-ranging sonic differences among playback systems would stillpersist. More important than accuracy is knowing how your mixes willtranslate to other speakers in other environments. That's the realbottom line. And the only way to gain that certainty is fromexperience. As they say, practice makes perfect — and it's nodifferent with reference monitors than with musical instruments. Afterall, a monitor is a musical instrument of sorts. Thus the needto spend many hours, many days, many months working with a set ofmonitors, “practicing” on them, listening to your resultson countless playback systems, always fine tuning, adjusting, figuringout what the quirks are, where the bumps and holes are, and how everylittle thing translates, until you reach a level of familiarity thatallows you to work undaunted, confident that the mix you dial in willbear a strong resemblance to what the end-user ultimately hears.Regardless of what monitors you use, until you are intimately familiarwith them, mixing will remain something of a guessing game.

This point was brought home to me recently as I chatted with ace mixengineer Chris Lord-Alge. With multiple platinum credits to his name,Lord-Alge certainly qualifies as an “expert” on the subjectof monitoring, at least in the sense that he knows what it takes toturn out mixes that sound great across the board, from boom box tohigh-end audiophile system. And just as surely, Lord-Alge has attainedsuccess enough to acquire and use any monitor he wants. So whatmonitors does he use? The latest, greatest, most expensive onesavailable? Not at all. Rather, Lord-Alge uses the same monitors he hasmixed on for most of his career: a pair of Yamaha NS-10Ms. “Thekey thing with any monitors,” explains Lord-Alge, “is thatyou get used to them. That's ultimately what makes them work for you.And 25 years on NS-10s hasn't led me wrong yet.”


This brings us to a can of worms I'd just as soon not open —but open it we must if we're to inquire seriously into the nature ofreference monitoring. Anyone who has searched for the“perfect” monitor has run smack into this dilemma, which isbest summed up by these questions: Who, ultimately, are you mixing for?The snooty audiophile with speakers that cost more than most folks'cars? Or the masses who listen to music on cheap systems?

Lord-Alge's answer is enlightening: “Ninety-five percent ofpeople listen to music in their car or on a cheap home stereo; 5percent may have better systems; and maybe 1 percent have a $20,000stereo. So if it doesn't sound good on something small, what's thepoint? You can mix in front of these huge, beautiful, pristine, $10,000powered monitors all you want. But no one else has those monitors, soyou're more likely to end up with a translation problem.”

Similarly, I learned a few years ago that John Leventhal, who wasone of my heroes at the time, did the bulk of his mixing on a pair ofsmall Radio Shack speakers. (Leventhal, a New York City-basedguitarist, songwriter, and engineer, made his mark by producing ShawnColvin's acclaimed 1989 record, Steady On.) Leventhal ownsboth a pair of Yamaha NS-10Ms and a pair of Radio Shack Optimus 7s. Buthe prefers the latter.

SIDEBAR: Now What?
by Scott Wilkinson

Once you have selected your monitors, it's time to place them in