Engineer Andrew Scheps is on a mission to bring music and tech
communities together for a common understanding of how consumer
audio formats impact the listening experience. Learn why these specs are
so important—and how you might have more control than you think.
You know how to make a killer mix. And you know to make sure that it is carefully
mastered. But what happens after that? What exactly happens to your tracks once
they are sent out into the world?
Engineer Andrew Scheps wanted the answer
to these questions.
With two Grammys and mixing credits ranging
from Adele to U2, the Red Hot Chili Peppers,
Black Sabbath, Jay-Z, and Metallica under his belt,
it’s safe to say Scheps has the mixing part down.
But like many producers and engineers, he needed
a resource for understanding the differences in the
variety of consumer audio delivery formats available
to him. So he built it himself.
For the past year-and-a-half or so, Scheps has
been on a road show of sorts, traveling around the
country giving lectures on audio quality in association
with the Recording Academy. His presentation,
called “Lost in Translation,” gives attendees
a chance to compare many of the current music
services to demonstrate the audible differences
among the various file formats they use.
Granted, file-format and bit rate information is
made available by the services and can be found if
you search for it, but Scheps’ presentation provides
it all in one place and gives it context, accompanied
by a listening session including examples from all
the major services. And perhaps more importantly,
in a time when music technology is more dependent
than ever on distribution and delivery technologies,
he’s bringing the production community
and the music-tech community—including Amazon, Apple, Google, iTunes, PONO, Rdio, Rhapsody,
SoundCloud, and YouTube—together to open a dialog
about the importance of sound quality.
I sat down with Scheps to get his take on the
evolving format landscape and learn how musicians,
producers, and engineers can do their best to ensure
that their music is heard as it was meant to be.
Tell me about the genesis of the “Lost in Translation”
When I started my label, I started to have to deal more
with the production side of things than I ever had before.
Obviously, as someone who mixes records, I care
about mastering, but that’s sort of as far as it went.
Once I had to start distributing through all of the
digital services and manufacturing vinyl, I started to
really pay attention to what happens after mastering.
Coinciding with that, the Recording Academy
put on an event in L.A. called GRAMMY Future
Now, which was a one-day, TED-style conference.
They asked me to put together something, so I decided
to do an audio-quality presentation. One that
had nothing to do with the record-making side, but
to take the record as a finished product—no matter
what it sounds like, and no matter what sample rate
it is, no matter whether it was an audiophile,
good-sounding record or whether
it’s a really dirty, distorted, good-feeling
record—and follow it through the food
chain from when it leaves mastering to
when it gets into the consumer’s hands.
I really wanted to spend some time talking
about the roads I’d traveled, figuring
out why so much of the music I heard just
didn’t sound that good.
How did you structure the demonstration?
For the comparative listening portion, I put together
a playlist of 18 songs across a wide set of genres.
I then went and got every single commercially
available version of the songs that I could. I had the
high-res that was for sale on HD Tracks, and a copy
of the CD, which I would rip the AIFF files from. I
then bought the tracks on iTunes and on Amazon,
and then did live streaming at that point from Spotify,
YouTube, and Rdio. Since then, the streaming
landscape has changed quite a bit, so services come
and go in the presentation. As far as I could, I tried
to make sure that exactly the same master had been
used to create these files for all of the services.
The idea is, attendees pick one of the songs, pick
a format they want to listen to it in, and then compare
it to another format immediately afterwards.
This allows people to really listen to the different
formats side by side and make real comparisons. In
most cases it’s a very real difference in audio quality
that is pretty easy to hear as you jump around.
I quickly realized that just playing the tracks
wasn’t enough on it’s own, I really needed to provide
context so everybody knew exactly what they were
listening to without me having to stop and explain
anything, so that became the setup with slides. I
talk about the history of recorded music, digitization,
lossless versus lossy codecs, and then lay out
which file formats and bit rates the music services are using. And then, of course, you have to go into,
well, who cares if it sounds better or worse? So you
have to start talking about how people react when
they listen to music, which in a lot of ways is a much
more important issue, trying to talk about the emotional
content of the music, which is the whole point.
Artists, producers and engineers often talk about the
technical side of recording and making records, but
all we’re really trying to do is to make some art come
out of the speakers and induce emotion in the listener.
So how is that impacted by the audio quality?
So much of this is anecdotal, and subjective.
Yeah, it is very anecdotal. And that’s the problem.
Because there are all kinds of anecdotes about
how, well, you know, kids who’ve only heard MP3s
prefer MP3s. And it turns out that’s not true at all,
and that’s been tested in a very scientific way.
Everybody is going to make their own kind of judgment
and decision. But I love that I can give people the
opportunity to get in a room and actually listen back to
back. It’s not double blind, and it’s not the most scientific
thing in the world, but it’s a pretty good chance to
actually hear the differences for yourself.
During your demo, it was interesting to watch
the audience question playback volume, converters
and cables, ask you to turn off the air
conditioning—but you don’t really need a critical
listening scenario to hear the difference.
One of the things that really drove it home to me
is when I was putting the presentation together for
the very first time. I’d gotten all of these files together,
and I was arranging them all. I was checking
things on the computer, and listening on my laptop
speakers, just to make sure, okay, that is that song,
and then looking at the file info to make sure it was
the right bit rate, and things like that. And I could
absolutely hear the difference on my laptop speakers,
and I wasn’t expecting or trying to.
So it’s not this weird, audiophile, you-have-to-be-in-a-perfect-listening-environment thing to experience
the difference; it’s just listening to music.
What were attendees—from both the tech
community and the production community—
most surprised to learn from these exercises?
You know what’s interesting? Basically the reactions
have been exactly the same. When I’m with
the production community, there’s a lot of information
that people kind of know but don’t know, in
terms of which bit rates all of these different service
providers are giving you, what’s the difference
between MP3 and AAC, who developed AAC, etc.
They all know little bits and pieces of it but haven’t
put it all together. And in the tech community, a lot
of people know some of the theory—and in a much
more thorough and technical way than I do—but
they don’t have a big picture in terms of, what does
that mean when you’re making a record?
But in terms of the actual listening experience, and
what people are hoping to get out of music, it’s exactly
the same no matter who is in the room. We are all just
consumers and music lovers at the end of the day.
It’s a little surprising to hear that the production
community doesn’t understand differences
between bit rate, bit depth, what happens
I think everybody sort of has a handle on what
those terms mean. But they haven’t put two and
two together in terms of the different services,
and what it is that they’re serving up, because they
haven’t had to. It just doesn’t come up.
It’s sort of the macro version of word processors.
Everybody used to know what a manual
typewriter did because they were watching it do
it. Nobody who isn’t a hardcore programmer really
knows what a word processing program does
anymore, and you don’t have to; you just have to
know how to operate it. And I think the for music
production, it’s very much a parallel progression.
A lot of technical knowledge used to be a given in
the recording industry, and it’s not any more. Anybody
can open up a laptop and use the software that comes
with it to make a record. And that record can sound
amazing. I’m not advocating that everyone needs to
have a hugely technical background. But it is very,
very simple to just jump in, without any technical
background at all. Which is great for the creative side
of it, but from the technical side, it’s a disaster.
With physical media like vinyl, people have
learned to accommodate for the format limitations
in the production process.
Yeah, but you know what? Even that is slightly misunderstood,
or at least there is still misinformation floating
around. Because I grew up learning that with vinyl
there are certain things you couldn’t do with the low
frequencies. And I just cut a record for Low Roar that
completely breaks the rules on what you’re allowed to
do with bass, and it cut perfectly and plays great. So
even on vinyl, which has been around for more than
50 years, I think we’re still figuring stuff out.
One interesting side effect of the resurgence in vinyl
has that people are having a look at more technical
aspects of the sonics of their work, because they’ve
been producing music for so long without having any
restrictions whatsoever, in terms of the physical limitations
of the formats they’re delivering on. But now
you’ve got music with so much more low end, and it’s
so much louder than it used to be. Figuring out how
to translate that onto this 50-year-old technology, it’s
pretty fascinating how easily it can actually work.
What do you think are the biggest misconceptions
about lossy codecs and streaming formats?
I don’t know if this is a misconception, but I think one
of the things that people don’t necessarily take into account
is, there’s nothing inherently horrible sonically
about the lossy codecs. At low bit rates, they sound
terrible. But at high bit rates, in quick A/B tests, it’s
very hard to tell an encoded file that’s done well that
you bought off iTunes apart from the CD.
What matters is what goes into those codecs.
To have a master that is actually a full-res master,
that has been prepared properly, that is 24-bit,
that has about half a dB of headroom. That will
make a great-sounding iTunes file, for instance.
But you can have that exact same codec at a higher
bit rate, which is what YouTube uses when you’re
watching high-definition video. That’s actually the
320 AAC, as opposed to a 256 AAC, which is what
iTunes sells. But if you give it an MP3 to encode, it’s
going to sound terrible. Because you’re taking one
lossy format and transcoding it to another.
Maybe there’s a little too much emphasis given
to the numbers involved with the bit rate and the
bit depth and the sample rate and things like that,
and not enough attention paid to the process of actually
creating these files in the first place.
You’re presenting a holistic argument, but we
are dealing with real numbers in some sense.
Yeah, we are. But it isn’t as easy as just comparing
the numbers. It’s a great place to start, and I think
it’s a really good guideline, but it is much more
complicated than that. At some point you actually
really do have to listen, and you have to trust your
ears. And to really live with those results and try
and figure out what means going forward. And
that’s much, much more difficult.
With a visual product, it’s very easy to put ten
people in a room and nine out of ten times they will
agree on what looks better. People don’t trust their
ears. You can be tricked by just making it a tiny bit
louder. If you’re watching it with video, a better-looking
video will make the audio sound better. It’s
a much more difficult, weirder process to assess audio,
and it’s much harder to be objective about it.
Data pipes are improving all the time; how do you
see that impacting the evolution of these codecs?
I think it’s twofold. The first thing is, and I don’t
want to sound angry about this, but it’s really, really,
really important to remember that all of the streaming
and download services are not music companies.
They’re data delivery companies. It happens
that they’re delivering music because that’s a very
big market, even though it’s shrinking significantly
every year, but it is still a huge amount of money.
There hasn’t really been an incentive for any of
these companies to say, “Yeah, but we’re the better-sounding one.” Because it is much more about,
“We’re the one that delivers all of your music, immediately,
no matter where you are. And it doesn’t
buffer, and it doesn’t stutter, and everything plays.”
So as big as the pipes are, they’re always being tailored
for worst-case scenarios.
Now, what’s interesting is that there are some technologies
that help with that, like Orastream, which is
an adaptive streaming technology, so the bit rate will
actually climb and fall based on your connection. And
it’s not each time you play a song that it decides what
bit rate you’ll hear. It’s instantaneous and dynamic.
They use a version of a Fraunhofer codec; it’s
the MP4 SLS layer. It’s a worldwide standard for
data compression that wraps a file in a way that
it can be played at any bit
rate. And they can stream up
to 192kHz/24-bit uncompressed
audio. And there are
companies like Tidal, who is
streaming CD-quality audio.
Deezer’s Elite service streams
35 million songs in 16-bit
FLAC. Companies are starting
to differentiate themselves based on bit rate.
But I think that basically there’ll just be one little
round of catch-up while everybody ups their bit
rate, the same way they did when Apple flipped the
switch and went from 128 to 256. And I wouldn’t be
surprised if Apple is the last to do it, but they will up
it as well, because Beats Music streams at 320, and
they’ve bought that. But for any service to change the
quality of the music they are providing, it means they
have to re-encode their catalog, and they’re paying
the labels millions of dollars a year for the privilege
of playing the catalog already and that agreement
may or may not cover another bit rate or format.
And as much as people want to vilify the streaming
companies, 70 percent or more of their revenue
is being paid to the rights holders. To really change
the income model, companies need to require you
to pay money to subscribe, as opposed to having a
free tier. But as long as there’s YouTube, you have
to have a free version of your streaming service to
compete or no one will use it. That’s the real quandary
we’re in, is that the business model has been
free for so long that no one will pay.
But that really does go back to the core of audio
quality. The consumer drives all of this. If the consumer
says we’re not going to buy your subscription
because your audio sucks, then providers will
make their audio better. If a provider gives the
consumer something that is obviously better then
he or she will want it.
The vast majority of music discovery is happening
on YouTube, where audio is completely
dependent on video streaming rates. What can
music creators do?
It’s twofold, and neither side of it is terribly good.
You have to decide whether you just don’t put
your stuff up on YouTube at all, or you only put
up really, really crappy versions hoping that makes
people go buy your music if they like it.
But part of my argument in terms of how
people listen to music is, the worse it sounds, the
fewer people are going to like it enough to want to
spend money on it. There’s some weird, shifting
threshold of when you decide you love something;
and the worse the music sounds, the harder it is
for that music to make it up over that threshold.
So if you want to make it sound good, then every
artist needs to upload their own music to YouTube,
because they can control what file gets encoded.
You can upload up to a 48k/24-bit WAV file for
your audio source, and that will make a really good-sounding
AAC when people are watching the high-res
video, which give you the 320k AAC file.
But then, all they’re doing is reinforcing the
fact that that’s good enough in terms of where
people are going to get their music. And the business
model at YouTube is (to oversimplify): ignore
copyright, sell ads on it.
If you upload a song that has a copyright holder,
or you are the copyright holder and someone else
uploads your song, you are presented with three
choices: You can, it’s fine that someone violated
my copyright, just leave it up there for free and
YouTube will host it, and that’s it. Or you can say,
take it down, because it is copyrighted and someone
has violated my copyright and that shouldn’t
be allowed to happen. Or you can do what almost
every single artist in the world does: Say, okay,
you’ve violated my copyright, but I really, really
need the money, so please monetize that video
with ads and give me some unknown percentage
of the money, because it’s my song.
And it has just decimated the entire idea of
copyright. People just assume that everything in
the world should be on there for free.
Given so many end points for your music, and
levels of control, do you take anything into
consideration as far as your mixes?
While making a record, I absolutely don’t take it
into consideration at all. And I don’t really think
anybody should. I think you need to make the record
that sounds right to you. I use the phrase “the
best-sounding record you can make.” But I don’t
mean that in an audiophile, win-a-Grammy-for-your-engineering way. Because we’re not all that
type of engineer. You just have to make the record
that you think sounds awesome, and is exciting,
and is the record you wanted to make musically.
Then the only thing really to know, if you have
control over what is sent to the different digital
services, is that first of all, mastered for iTunes, all
that means is they can encode a 24-bit source instead
of 16-bit source. And they’ve optimized their
codec so that a 24-bit source sounds much better
than a 16-bit source. They would prefer that you
send in 96k/24-bit, but their encoding at anything
above 44.1 is a two-step process where first they
just sample-rate convert it down to 44.1, because
the AAC encoder itself can only accept 44.1.
The other thing is, especially on Mastered for
iTunes, it has to come from a verified mastering house.
So those mastering engineers know that one of the
most important things to make that encoding work
well is headroom. It doesn’t mean that your mix can’t
be super, super loud and have lots of square waves and
lots of distortion, but what it means is when you’re
done, just gain down the mix between 0.5 and 0.7 dB,
and it will sound better going through the encoder.
The reason is, as you go through any of the lossy
encodings—MP3, AAC, Ogg Vorbis—you pick up harmonic
distortion as you encode, and that adds level.
And if you’re already at digital 0, then you’re going to
overshoot, and you get very nasty distortion. Which is
not the clipping that you’ve put into your mix because
you like the way it sounds. It’s just full-on, digital shaving
off the top of a waveform, with no regard to anything,
because it just can’t recreate that level.
So if you’re just sending files to be put in, like if
you’re using a digital aggregator, if you don’t have a
distributor and you’re not going through a label, this
is how you would put out your record: Take your final
mixes, after they’re mastered and as loud as you want
them to be, turn them down half a dB, and they will
sound better once they are encoded. The same way
you would do slightly different mastering for vinyl if
you’ve got a very bright record, with a lot of esses.
What about heavily compressed mixes?
Again, if you just turn them down before the encoding,
then you’ll be fine. I know there are a lot of people
who are up in arms about dynamic range, but I still
mix loud, and I will probably always mix loud, because
that’s what sounds good to me. I’m not going to
win an engineering Grammy, and that’s fine. I’m okay
with that. But I think people think a lot of the records I
mix sound exciting, and that is what I’m proud of. And
I know that if I take those incredibly loud mixes and
turn them down 0.5, 0.7 dB, they’ll encode just fine,
and sound like the loud mixes they already are.
Are you encouraged by the growth in high-resolution
I think there are some good omens in that area.
I think things like Sonos are brilliant. Because
they don’t make you give up the convenience: to
have wireless speakers that can actually stream
48k/24-bit audio between each other is amazing.
And they actually sound good.
There are a lot of cars that sound amazing now,
where you would absolutely hear the difference
between an MP3 and a CD.
There was a movement at AES this year, in association
with the Consumer Electronics Association’s
High Resolution Audio Initiative, to introduce a logo
for devices that play high-res audio. And their definition
of high-res audio is anything better than CD
quality, so 44.1/24-bit would count as high-res.
Consumer-audio manufacturers see a need to
differentiate themselves as having products that
sound good and are capable of playing back these
files. So that’s a huge step in the right direction.
Basically, the more people are aware that stuff
could sound better, the better off we are. And I
think that the idea of a home stereo is starting to
take root again as well.
Sony has a high-res Walkman now, and they’re
actually calling it a Walkman again, which I think
is pretty fun. There’s the Pono player, which when
it comes out will be very cool. Its business model
is a little bit different, with its own store, but there
are a lot of ways for people to listen to the music.
That was the other thing until recently: You
could buy this high-res stuff off of HD tracks for
years now. But you didn’t know how to play it
back, and it was seen as an audiophile alternative, not just something that sounds better. Now, there
are devices specifically set up to play those files,
so you no longer have to be a geek to figure it out.
96/24 and DSD sound awesome, and some vinyl
sounds awesome. There are some amazing-sounding
CDs. I would say that the more I do this research, and
the more I do the listening, the less I’m convinced that
consumers need super-high-sample-rate stuff.
I think that the lossy-versus-lossless argument
is a much, much more important topic to look at.
I think there are neurological consequences to
some of the lossy encoding that goes beyond just
subjective audio quality assessment. Nothing is
proven yet, but we’re heading in that direction.
It’s been made really easy for the consumer.
But do you think that’s enough?
It takes education along with it. But I think that
consumers love to have good stuff, and I think that
we’re also going to get back to where the music that
they love will be really important to them, even
though they may not ever buy it, because I unfortunately
think that that business model is gone now.
But they will care how it sounds when they want
to play somebody a song they like. They’re not going
to want to hit Play and it’s some transcoded
YouTube video and they forget to go to high-res,
and it sounds terrible, and their friend is like, “Yeah,
whatever, that song’s okay, I guess.” They want
people to be blown away, and they’ll actually want
to make sure that it sounds good. So hopefully that
will start to just be part of listening again.
Somebody goes to your presentation, gets inspired,
and is ready to take the next step. What
should they do?
Make sure that you’ve got masters that sound good,
and make sure that whoever is sending them off to the
services knows that maybe getting a little headroom
for the encoders is a good idea. That’s sort of it from
the record-making side. As artists, though, you can
make your hi-res stuff available. Even if you’re signed
to a label, force the label to make it available. And as
consumers, you’ve just got to be smart; every time you
can, take the YouTube video up to one of the HD video
formats, because that’s the only way to get better
sounding audio. If you’re on Spotify, actually pay for
the subscription and then go into audio preferences,
and check high-quality audio, which is unchecked by
default. It doesn’t take a whole lot to educate yourself.
Do you sort of feel like the poster child for this
movement at the moment?
I have no idea. If I am, that’s a good thing. The
whole idea of the presentation is to take lots of
stuff that you kinda-sorta know and encapsulate it
and put it in context so now you can go off and do
something yourself. So if I’m the poster child for
that sort of education, I love that.
Consumer Audio Formats
Bit Rates and Codecs, Compared
HIGH-RES Up to 192kHz, 32-bit WAV
CD 44.1kHz, 16-bit WAV
YOUTUBE 128 and 384 kbps AAC
PLAY MUSIC Up to 320 kbps
BEATS MUSIC 320 kbps MP3
SPOTIFY 160, 320 kbps Ogg Vorbis
ITUNES 256 kbps AAC
AMAZON 256 kbps MP3
RHAPSODY/RDIO 192 MP3
XM SAT 39 kbps proprietary
ANALOG acoustic pressure wave, voltage