Engineer Andrew Scheps is on a mission to bring music and tech communities together for a common understanding of how consumer audio formats impact the listening experience. Learn why these specs are so important—and how you might have more control than you think.
You know how to make a killer mix. And you know to make sure that it is carefully mastered. But what happens after that? What exactly happens to your tracks once they are sent out into the world?
Engineer Andrew Scheps wanted the answer to these questions.
With two Grammys and mixing credits ranging from Adele to U2, the Red Hot Chili Peppers, Black Sabbath, Jay-Z, and Metallica under his belt, it’s safe to say Scheps has the mixing part down. But like many producers and engineers, he needed a resource for understanding the differences in the variety of consumer audio delivery formats available to him. So he built it himself.
For the past year-and-a-half or so, Scheps has been on a road show of sorts, traveling around the country giving lectures on audio quality in association with the Recording Academy. His presentation, called “Lost in Translation,” gives attendees a chance to compare many of the current music services to demonstrate the audible differences among the various file formats they use.
Granted, file-format and bit rate information is made available by the services and can be found if you search for it, but Scheps’ presentation provides it all in one place and gives it context, accompanied by a listening session including examples from all the major services. And perhaps more importantly, in a time when music technology is more dependent than ever on distribution and delivery technologies, he’s bringing the production community and the music-tech community—including Amazon, Apple, Google, iTunes, PONO, Rdio, Rhapsody, SoundCloud, and YouTube—together to open a dialog about the importance of sound quality.
I sat down with Scheps to get his take on the evolving format landscape and learn how musicians, producers, and engineers can do their best to ensure that their music is heard as it was meant to be.
Tell me about the genesis of the “Lost in Translation” project.
When I started my label, I started to have to deal more with the production side of things than I ever had before. Obviously, as someone who mixes records, I care about mastering, but that’s sort of as far as it went. Once I had to start distributing through all of the digital services and manufacturing vinyl, I started to really pay attention to what happens after mastering.
Coinciding with that, the Recording Academy put on an event in L.A. called GRAMMY Future Now, which was a one-day, TED-style conference. They asked me to put together something, so I decided to do an audio-quality presentation. One that had nothing to do with the record-making side, but to take the record as a finished product—no matter what it sounds like, and no matter what sample rate it is, no matter whether it was an audiophile, good-sounding record or whether it’s a really dirty, distorted, good-feeling record—and follow it through the food chain from when it leaves mastering to when it gets into the consumer’s hands.
I really wanted to spend some time talking about the roads I’d traveled, figuring out why so much of the music I heard just didn’t sound that good.
How did you structure the demonstration?
For the comparative listening portion, I put together a playlist of 18 songs across a wide set of genres. I then went and got every single commercially available version of the songs that I could. I had the high-res that was for sale on HD Tracks, and a copy of the CD, which I would rip the AIFF files from. I then bought the tracks on iTunes and on Amazon, and then did live streaming at that point from Spotify, YouTube, and Rdio. Since then, the streaming landscape has changed quite a bit, so services come and go in the presentation. As far as I could, I tried to make sure that exactly the same master had been used to create these files for all of the services.
The idea is, attendees pick one of the songs, pick a format they want to listen to it in, and then compare it to another format immediately afterwards. This allows people to really listen to the different formats side by side and make real comparisons. In most cases it’s a very real difference in audio quality that is pretty easy to hear as you jump around.
I quickly realized that just playing the tracks wasn’t enough on it’s own, I really needed to provide context so everybody knew exactly what they were listening to without me having to stop and explain anything, so that became the setup with slides. I talk about the history of recorded music, digitization, lossless versus lossy codecs, and then lay out which file formats and bit rates the music services are using. And then, of course, you have to go into, well, who cares if it sounds better or worse? So you have to start talking about how people react when they listen to music, which in a lot of ways is a much more important issue, trying to talk about the emotional content of the music, which is the whole point. Artists, producers and engineers often talk about the technical side of recording and making records, but all we’re really trying to do is to make some art come out of the speakers and induce emotion in the listener. So how is that impacted by the audio quality?
So much of this is anecdotal, and subjective.
Yeah, it is very anecdotal. And that’s the problem. Because there are all kinds of anecdotes about how, well, you know, kids who’ve only heard MP3s prefer MP3s. And it turns out that’s not true at all, and that’s been tested in a very scientific way.
Everybody is going to make their own kind of judgment and decision. But I love that I can give people the opportunity to get in a room and actually listen back to back. It’s not double blind, and it’s not the most scientific thing in the world, but it’s a pretty good chance to actually hear the differences for yourself.
During your demo, it was interesting to watch the audience question playback volume, converters and cables, ask you to turn off the air conditioning—but you don’t really need a critical listening scenario to hear the difference.
One of the things that really drove it home to me is when I was putting the presentation together for the very first time. I’d gotten all of these files together, and I was arranging them all. I was checking things on the computer, and listening on my laptop speakers, just to make sure, okay, that is that song, and then looking at the file info to make sure it was the right bit rate, and things like that. And I could absolutely hear the difference on my laptop speakers, and I wasn’t expecting or trying to.
So it’s not this weird, audiophile, you-have-to-be-in-a-perfect-listening-environment thing to experience the difference; it’s just listening to music.
What were attendees—from both the tech community and the production community— most surprised to learn from these exercises?
You know what’s interesting? Basically the reactions have been exactly the same. When I’m with the production community, there’s a lot of information that people kind of know but don’t know, in terms of which bit rates all of these different service providers are giving you, what’s the difference between MP3 and AAC, who developed AAC, etc. They all know little bits and pieces of it but haven’t put it all together. And in the tech community, a lot of people know some of the theory—and in a much more thorough and technical way than I do—but they don’t have a big picture in terms of, what does that mean when you’re making a record?
But in terms of the actual listening experience, and what people are hoping to get out of music, it’s exactly the same no matter who is in the room. We are all just consumers and music lovers at the end of the day.
It’s a little surprising to hear that the production community doesn’t understand differences between bit rate, bit depth, what happens during transcoding.
I think everybody sort of has a handle on what those terms mean. But they haven’t put two and two together in terms of the different services, and what it is that they’re serving up, because they haven’t had to. It just doesn’t come up.
It’s sort of the macro version of word processors. Everybody used to know what a manual typewriter did because they were watching it do it. Nobody who isn’t a hardcore programmer really knows what a word processing program does anymore, and you don’t have to; you just have to know how to operate it. And I think the for music production, it’s very much a parallel progression.
A lot of technical knowledge used to be a given in the recording industry, and it’s not any more. Anybody can open up a laptop and use the software that comes with it to make a record. And that record can sound amazing. I’m not advocating that everyone needs to have a hugely technical background. But it is very, very simple to just jump in, without any technical background at all. Which is great for the creative side of it, but from the technical side, it’s a disaster.
With physical media like vinyl, people have learned to accommodate for the format limitations in the production process.
Yeah, but you know what? Even that is slightly misunderstood, or at least there is still misinformation floating around. Because I grew up learning that with vinyl there are certain things you couldn’t do with the low frequencies. And I just cut a record for Low Roar that completely breaks the rules on what you’re allowed to do with bass, and it cut perfectly and plays great. So even on vinyl, which has been around for more than 50 years, I think we’re still figuring stuff out.
One interesting side effect of the resurgence in vinyl has that people are having a look at more technical aspects of the sonics of their work, because they’ve been producing music for so long without having any restrictions whatsoever, in terms of the physical limitations of the formats they’re delivering on. But now you’ve got music with so much more low end, and it’s so much louder than it used to be. Figuring out how to translate that onto this 50-year-old technology, it’s pretty fascinating how easily it can actually work.
What do you think are the biggest misconceptions about lossy codecs and streaming formats?
I don’t know if this is a misconception, but I think one of the things that people don’t necessarily take into account is, there’s nothing inherently horrible sonically about the lossy codecs. At low bit rates, they sound terrible. But at high bit rates, in quick A/B tests, it’s very hard to tell an encoded file that’s done well that you bought off iTunes apart from the CD.
What matters is what goes into those codecs. To have a master that is actually a full-res master, that has been prepared properly, that is 24-bit, that has about half a dB of headroom. That will make a great-sounding iTunes file, for instance.
But you can have that exact same codec at a higher bit rate, which is what YouTube uses when you’re watching high-definition video. That’s actually the 320 AAC, as opposed to a 256 AAC, which is what iTunes sells. But if you give it an MP3 to encode, it’s going to sound terrible. Because you’re taking one lossy format and transcoding it to another.
Maybe there’s a little too much emphasis given to the numbers involved with the bit rate and the bit depth and the sample rate and things like that, and not enough attention paid to the process of actually creating these files in the first place.
You’re presenting a holistic argument, but we are dealing with real numbers in some sense.
Yeah, we are. But it isn’t as easy as just comparing the numbers. It’s a great place to start, and I think it’s a really good guideline, but it is much more complicated than that. At some point you actually really do have to listen, and you have to trust your ears. And to really live with those results and try and figure out what means going forward. And that’s much, much more difficult.
With a visual product, it’s very easy to put ten people in a room and nine out of ten times they will agree on what looks better. People don’t trust their ears. You can be tricked by just making it a tiny bit louder. If you’re watching it with video, a better-looking video will make the audio sound better. It’s a much more difficult, weirder process to assess audio, and it’s much harder to be objective about it.
Data pipes are improving all the time; how do you see that impacting the evolution of these codecs?
I think it’s twofold. The first thing is, and I don’t want to sound angry about this, but it’s really, really, really important to remember that all of the streaming and download services are not music companies. They’re data delivery companies. It happens that they’re delivering music because that’s a very big market, even though it’s shrinking significantly every year, but it is still a huge amount of money.
There hasn’t really been an incentive for any of these companies to say, “Yeah, but we’re the better-sounding one.” Because it is much more about, “We’re the one that delivers all of your music, immediately, no matter where you are. And it doesn’t buffer, and it doesn’t stutter, and everything plays.” So as big as the pipes are, they’re always being tailored for worst-case scenarios.
Now, what’s interesting is that there are some technologies that help with that, like Orastream, which is an adaptive streaming technology, so the bit rate will actually climb and fall based on your connection. And it’s not each time you play a song that it decides what bit rate you’ll hear. It’s instantaneous and dynamic.
They use a version of a Fraunhofer codec; it’s the MP4 SLS layer. It’s a worldwide standard for data compression that wraps a file in a way that it can be played at any bit rate. And they can stream up to 192kHz/24-bit uncompressed audio. And there are companies like Tidal, who is streaming CD-quality audio. Deezer’s Elite service streams 35 million songs in 16-bit FLAC. Companies are starting to differentiate themselves based on bit rate.
But I think that basically there’ll just be one little round of catch-up while everybody ups their bit rate, the same way they did when Apple flipped the switch and went from 128 to 256. And I wouldn’t be surprised if Apple is the last to do it, but they will up it as well, because Beats Music streams at 320, and they’ve bought that. But for any service to change the quality of the music they are providing, it means they have to re-encode their catalog, and they’re paying the labels millions of dollars a year for the privilege of playing the catalog already and that agreement may or may not cover another bit rate or format.
And as much as people want to vilify the streaming companies, 70 percent or more of their revenue is being paid to the rights holders. To really change the income model, companies need to require you to pay money to subscribe, as opposed to having a free tier. But as long as there’s YouTube, you have to have a free version of your streaming service to compete or no one will use it. That’s the real quandary we’re in, is that the business model has been free for so long that no one will pay.
But that really does go back to the core of audio quality. The consumer drives all of this. If the consumer says we’re not going to buy your subscription because your audio sucks, then providers will make their audio better. If a provider gives the consumer something that is obviously better then he or she will want it.
The vast majority of music discovery is happening on YouTube, where audio is completely dependent on video streaming rates. What can music creators do?
It’s twofold, and neither side of it is terribly good. You have to decide whether you just don’t put your stuff up on YouTube at all, or you only put up really, really crappy versions hoping that makes people go buy your music if they like it.
But part of my argument in terms of how people listen to music is, the worse it sounds, the fewer people are going to like it enough to want to spend money on it. There’s some weird, shifting threshold of when you decide you love something; and the worse the music sounds, the harder it is for that music to make it up over that threshold.
So if you want to make it sound good, then every artist needs to upload their own music to YouTube, because they can control what file gets encoded. You can upload up to a 48k/24-bit WAV file for your audio source, and that will make a really good-sounding AAC when people are watching the high-res video, which give you the 320k AAC file.
But then, all they’re doing is reinforcing the fact that that’s good enough in terms of where people are going to get their music. And the business model at YouTube is (to oversimplify): ignore copyright, sell ads on it.
If you upload a song that has a copyright holder, or you are the copyright holder and someone else uploads your song, you are presented with three choices: You can, it’s fine that someone violated my copyright, just leave it up there for free and YouTube will host it, and that’s it. Or you can say, take it down, because it is copyrighted and someone has violated my copyright and that shouldn’t be allowed to happen. Or you can do what almost every single artist in the world does: Say, okay, you’ve violated my copyright, but I really, really need the money, so please monetize that video with ads and give me some unknown percentage of the money, because it’s my song.
And it has just decimated the entire idea of copyright. People just assume that everything in the world should be on there for free.
Given so many end points for your music, and levels of control, do you take anything into consideration as far as your mixes?
While making a record, I absolutely don’t take it into consideration at all. And I don’t really think anybody should. I think you need to make the record that sounds right to you. I use the phrase “the best-sounding record you can make.” But I don’t mean that in an audiophile, win-a-Grammy-for-your-engineering way. Because we’re not all that type of engineer. You just have to make the record that you think sounds awesome, and is exciting, and is the record you wanted to make musically.
Then the only thing really to know, if you have control over what is sent to the different digital services, is that first of all, mastered for iTunes, all that means is they can encode a 24-bit source instead of 16-bit source. And they’ve optimized their codec so that a 24-bit source sounds much better than a 16-bit source. They would prefer that you send in 96k/24-bit, but their encoding at anything above 44.1 is a two-step process where first they just sample-rate convert it down to 44.1, because the AAC encoder itself can only accept 44.1.
The other thing is, especially on Mastered for iTunes, it has to come from a verified mastering house. So those mastering engineers know that one of the most important things to make that encoding work well is headroom. It doesn’t mean that your mix can’t be super, super loud and have lots of square waves and lots of distortion, but what it means is when you’re done, just gain down the mix between 0.5 and 0.7 dB, and it will sound better going through the encoder.
The reason is, as you go through any of the lossy encodings—MP3, AAC, Ogg Vorbis—you pick up harmonic distortion as you encode, and that adds level. And if you’re already at digital 0, then you’re going to overshoot, and you get very nasty distortion. Which is not the clipping that you’ve put into your mix because you like the way it sounds. It’s just full-on, digital shaving off the top of a waveform, with no regard to anything, because it just can’t recreate that level.
So if you’re just sending files to be put in, like if you’re using a digital aggregator, if you don’t have a distributor and you’re not going through a label, this is how you would put out your record: Take your final mixes, after they’re mastered and as loud as you want them to be, turn them down half a dB, and they will sound better once they are encoded. The same way you would do slightly different mastering for vinyl if you’ve got a very bright record, with a lot of esses.
What about heavily compressed mixes?
Again, if you just turn them down before the encoding, then you’ll be fine. I know there are a lot of people who are up in arms about dynamic range, but I still mix loud, and I will probably always mix loud, because that’s what sounds good to me. I’m not going to win an engineering Grammy, and that’s fine. I’m okay with that. But I think people think a lot of the records I mix sound exciting, and that is what I’m proud of. And I know that if I take those incredibly loud mixes and turn them down 0.5, 0.7 dB, they’ll encode just fine, and sound like the loud mixes they already are.
Are you encouraged by the growth in high-resolution consumer formats?
I think there are some good omens in that area. I think things like Sonos are brilliant. Because they don’t make you give up the convenience: to have wireless speakers that can actually stream 48k/24-bit audio between each other is amazing. And they actually sound good.
There are a lot of cars that sound amazing now, where you would absolutely hear the difference between an MP3 and a CD.
There was a movement at AES this year, in association with the Consumer Electronics Association’s High Resolution Audio Initiative, to introduce a logo for devices that play high-res audio. And their definition of high-res audio is anything better than CD quality, so 44.1/24-bit would count as high-res.
Consumer-audio manufacturers see a need to differentiate themselves as having products that sound good and are capable of playing back these files. So that’s a huge step in the right direction.
Basically, the more people are aware that stuff could sound better, the better off we are. And I think that the idea of a home stereo is starting to take root again as well.
Sony has a high-res Walkman now, and they’re actually calling it a Walkman again, which I think is pretty fun. There’s the Pono player, which when it comes out will be very cool. Its business model is a little bit different, with its own store, but there are a lot of ways for people to listen to the music.
That was the other thing until recently: You could buy this high-res stuff off of HD tracks for years now. But you didn’t know how to play it back, and it was seen as an audiophile alternative, not just something that sounds better. Now, there are devices specifically set up to play those files, so you no longer have to be a geek to figure it out.
96/24 and DSD sound awesome, and some vinyl sounds awesome. There are some amazing-sounding CDs. I would say that the more I do this research, and the more I do the listening, the less I’m convinced that consumers need super-high-sample-rate stuff.
I think that the lossy-versus-lossless argument is a much, much more important topic to look at. I think there are neurological consequences to some of the lossy encoding that goes beyond just subjective audio quality assessment. Nothing is proven yet, but we’re heading in that direction.
It’s been made really easy for the consumer. But do you think that’s enough?
It takes education along with it. But I think that consumers love to have good stuff, and I think that we’re also going to get back to where the music that they love will be really important to them, even though they may not ever buy it, because I unfortunately think that that business model is gone now.
But they will care how it sounds when they want to play somebody a song they like. They’re not going to want to hit Play and it’s some transcoded YouTube video and they forget to go to high-res, and it sounds terrible, and their friend is like, “Yeah, whatever, that song’s okay, I guess.” They want people to be blown away, and they’ll actually want to make sure that it sounds good. So hopefully that will start to just be part of listening again.
Somebody goes to your presentation, gets inspired, and is ready to take the next step. What should they do?
Make sure that you’ve got masters that sound good, and make sure that whoever is sending them off to the services knows that maybe getting a little headroom for the encoders is a good idea. That’s sort of it from the record-making side. As artists, though, you can make your hi-res stuff available. Even if you’re signed to a label, force the label to make it available. And as consumers, you’ve just got to be smart; every time you can, take the YouTube video up to one of the HD video formats, because that’s the only way to get better sounding audio. If you’re on Spotify, actually pay for the subscription and then go into audio preferences, and check high-quality audio, which is unchecked by default. It doesn’t take a whole lot to educate yourself.
Do you sort of feel like the poster child for this movement at the moment?
I have no idea. If I am, that’s a good thing. The whole idea of the presentation is to take lots of stuff that you kinda-sorta know and encapsulate it and put it in context so now you can go off and do something yourself. So if I’m the poster child for that sort of education, I love that.
Consumer Audio Formats
Bit Rates and Codecs, Compared
HIGH-RES Up to 192kHz, 32-bit WAV
CD 44.1kHz, 16-bit WAV
YOUTUBE 128 and 384 kbps AAC
PLAY MUSIC Up to 320 kbps
BEATS MUSIC 320 kbps MP3
SPOTIFY 160, 320 kbps Ogg Vorbis
ITUNES 256 kbps AAC
AMAZON 256 kbps MP3
RHAPSODY/RDIO 192 MP3
XM SAT 39 kbps proprietary
ANALOG acoustic pressure wave, voltage