Highlights:
Inside Netflix’s Bet on Advanced Video Encoding
24/6/24
By:
BR Hariyani
How Cutting-Edge Codecs and Obsessive Tweaks Have Helped Netflix Stay Ahead of the Curve — Until Now
Anne Aaron just can’t help herself.
Aaron, Netflix’s senior encoding technology director, was watching the company’s livestream of the Screen Actors Guild Awards earlier this year. While the rest of the world marveled at all those celebrities and their glitzy outfits sparkling in a sea of flashing cameras, Aaron’s mind immediately started to analyze all the associated visual challenges Netflix’s encoding tech would have to tackle. “Oh my gosh, this content is going to be so hard to encode,” she recalled thinking when I recently interviewed her in Netflix’s office in Los Gatos, California.
Aaron has spent the past 13 years optimizing the way Netflix encodes its movies and TV shows. The work she and her team have done allows the company to deliver better-looking streams over slower connections and has resulted in 50 percent bandwidth savings for 4K streams alone, according to Aaron. Netflix’s encoding team has also contributed to industrywide efforts to improve streaming, including the development of the AV1 video codec and its eventual successor.
Embracing New Challenges: Live Streaming and Cloud Gaming
Now, Aaron is gearing up for the next big challenge at Netflix. Not content with just being a service for binge-watching, the company ventured into cloud gaming and livestreaming last year. So far, Netflix has primarily dabbled in one-off live events like the SAG Awards. But starting next year, the company will stream WWE RAW live every Monday. The streamer nabbed the wrestling franchise from Comcast’s USA Network, where it has long been the No. 1 rated show, regularly drawing audiences of around 1.7 million viewers. Satisfying that audience week after week poses some very novel challenges.
“It’s a completely different encoding pipeline than what we’ve had for VOD,” Aaron said, using industry shorthand for on-demand video streaming. “My challenge to (my) team is to get to the same bandwidth requirements as VOD but do it in a faster, real-time way.”
To achieve that, Aaron and her team have to basically start all over and disregard almost everything they’ve learned during more than a decade of optimizing Netflix’s streams — a decade during which Netflix’s video engineers re-encoded the company’s entire catalog multiple times, began using machine learning to make sure Netflix’s streams look good, and were forced to tweak their approach when a show like Barbie Dreamhouse Adventures tripped up the company’s encoders.
From Bitrates to Per-Title Encoding
When Aaron joined Netflix in 2011, the company was approaching streaming much like everyone else in the online video industry. “We have to support a huge variety of devices,” said Aaron. “Really old TVs, new TVs, mobile devices, set top boxes: each of those devices can have different bandwidth requirements.”
To address those needs, Netflix encoded each video with a bunch of different bitrates and resolutions according to a predefined list of encoding parameters, or recipes, as Aaron and her colleagues like to call them. Back in those days, a viewer on a very slow connection would automatically get a 240p stream with a bitrate of 235 kbps. Faster connections would receive a 1750 kbps 720p video; Netflix’s streaming quality topped out at 1080p with a 5800 kbps bitrate.
The company’s content delivery servers would automatically choose the best version for each viewer based on their device and broadband speeds and adjust the streaming quality on the fly to account for network slow-downs.
To Aaron and her eagle-eyed awareness of encoding challenges, that approach seemed inadequate. Why spend the same bandwidth to stream something as visually complex as an action movie with car chases (lots of motion) and explosions (flashing lights and all that noisy smoke) as much simpler visual fare? “You need less bits for animation,” explained Aaron.
My Little Pony, which was a hit on the service at the time, simply didn’t have the same visual complexity as live-action titles. It didn’t make sense to use the same encoding recipes for both. That’s why, in 2015, Netflix began re-encoding its entire catalog with settings fine-tuned per title. With this new, title-specific approach, animated fare could be streamed in 1080p with as little as 1.5 Mbps.
A Granular Approach: Per-Shot Encoding
Switching to per-title encoding resulted in bandwidth savings of around 20 percent on average — enough to make a notable difference for consumers in North America and Europe, but even more important as Netflix was eyeing its next chapter. In January of 2016, then-CEO Reed Hastings announced that the company was expanding into almost every country around the world — including markets with subpar broadband infrastructure and consumers who primarily accessed the internet from their mobile phone.
Per-title encoding has since been adopted by most commercial video technology vendors, including Amazon’s AWS, which used the approach to optimize PBS’s video library last year. But while the company’s encoding strategy has been wholeheartedly endorsed by streaming tech experts, it has been largely met with silence by Hollywood’s creative class.
Directors and actors like Judd Apatow and Aaron Paul were up in arms when Netflix began to let people change the playback speed of its videos in 2019. Changes to the way it encodes videos, on the other hand, never made the same kinds of headlines. That may be because encoding algorithms are a bit too geeky for that crowd, but there’s also a simpler explanation: the new encoding scheme was so successful at saving bandwidth without compromising on visual fidelity that no one noticed the difference.
Make that almost no one: Aaron quickly realized that the company’s per-title-based encoding approach wasn’t without faults. One problem became apparent to her while watching Barbie Dreamhouse Adventures. It’s one of those animated Netflix shows that was supposed to benefit the most from a per-title approach.
However, Netflix’s new encoding struggled with one particular scene. “There’s this guy with a very sparkly suit and a sparkly water fountain behind him,” said Aaron. The scene looked pretty terrible with the new encoding rules, which made her realize that they needed to be more flexible. “At (other) parts of the title, you need less bits,” Aaron said. “But for this, you need to increase it.”
Leveraging Machine Learning for Better Encoding
The solution to this problem was to get a lot more granular during the encoding process. Netflix began to break down videos by shots and apply different encoding settings to each individual segment in 2018. Two people talking in front of a plain white wall were encoded with lower bit rates than the same two people taking part in a car chase; Barbie hanging out with her friends at home required less data than the scene in which Mr. Sparklesuit shows up.
As Netflix adopted 4K and HDR, those differences became even more stark. “(In) The Crown, there’s an episode where it’s very smoky,” said Aaron. “There’s a lot of pollution. Those scenes are really hard to encode.” In other words: they require more data to look good, especially when shown on a big 4K TV in HDR, than less visually complex fare.
Aaron’s mind never stops looking for those kinds of visual challenges, no matter whether she watches Netflix after work or goes outside to take a walk. This has even caught on with her kids, with Aaron telling me that they occasionally point at things in the real world and shout: “Look, it’s a blur!”
It’s a habit that comes with the job and a bit of a curse, too — one of those things you just can’t turn off. During our conversation, she picked up her phone, only to pause and point at the rhinestone-bedazzled phone case. It reminded her of that hard-to-encode scene from Barbie Dreamhouse Adventures. Another visual challenge!
Still, even an obsessive mind can only get you so far. For one thing, Aaron can’t possibly watch thousands of Netflix videos and decide which encoding settings to apply to every single shot. Instead, her team compiled a few dozen short clips sourced from a variety of shows and movies on Netflix and encoded each clip with a range of different settings. They then let test subjects watch those clips and grade the visual imperfections from not noticeable to very annoying. “You have to do subjective testing,” Aaron said. “It’s all based on ground truth, subjective testing.”
The insights gained this way have been used by Netflix to train a machine learning model that can analyze the video quality of different encoding settings across the company’s entire catalog, which helps to figure out the optimal settings for each and every little slice of a show or movie. The company collaborated with the University of Southern California on developing these video quality assessment algorithms and open-sourced them in 2016. Since then, it has been adopted by much of the industry as a way to analyze streaming video quality and even gained Netflix an Emmy Award. All the while, Aaron and her team have worked to catch up with Netflix’s evolving needs — like HDR.
“We had to develop yet another metric to measure the video quality for HDR,” Aaron said. “We had to run subjective tests and redo that work specifically for HDR.” This eventually allowed Netflix to encode HDR titles with per-shot-specific settings as well, which the company finally did last year. Now, her team is working on open-sourcing HDR-based video quality assessment.
Collaborating on Codec Development
Slicing up a movie by shot and then encoding every slice individually to make sure it looks great while also saving as much bandwidth as possible: all of this work happens independently of the video codecs Netflix uses to encode and compress these files. It’s kind of like how you might change the resolution or colors of a picture in Photoshop before deciding whether to save it as a JPEG or a PNG. However, Netflix’s video engineers have also actively been working on advancing video codecs, even collaborating with some of their fiercest competitors in the process.
A few years ago, that cooperation produced AV1, a new, more efficient video codec that has now become the new default format for Netflix’s streams, and is on track to reduce bandwidth consumption by 50 percent on average once all devices support it. AV1 uses a more efficient way of compressing video, but it also includes new features that give companies like Netflix more ways to save data.
“Imagine that you have two people talking in a conference room, and then you have a panning motion,” explained Aaron. “You can actually tell the video codec: ‘Oh, this is the same conference room; I’m just moving around.’ So the video codec does not have to reencode that.”
Entering the Era of Video Streaming
Netflix began switching from the legacy H.264 format to AV1 last year and has since been rolling it out across its entire device ecosystem. The company expects all of the devices used to watch Netflix streams to support AV1 by the end of 2024. This includes a wide range of streaming devices and smart TVs, but also game consoles like the PlayStation 4. “We worked with Sony to enable AV1 for Netflix on the PS4,” Aaron said.
All of this is a far cry from how the company handled video when Aaron first joined in 2011. Back then, she recalls being proud of Netflix streaming content at 1080p in a 5800 kbps bitrate. “Now, we’re delivering better quality at like half that bitrate,” Aaron said.
AV1 isn’t the final destination on this journey. In fact, Aaron and her team have been working with other streaming companies on a successor called VVC, which stands for Versatile Video Coding. “We’re excited for VVC because it can further help us reduce our footprint,” she said. “It will take time before we can use it on all devices, but we’re excited about it.”
The Challenge of Real-Time Encoding for Live Streaming
But the biggest challenge that Netflix’s encoding team is now facing doesn’t actually require any of this new video codec work. It’s the return to real-time encoding, a concept that Aaron and her colleagues thought they had left behind for good when they first began working on per-title encoding in 2015.
Netflix’s encoding team has been applying the same per-shot encoding techniques to its cloud gaming efforts, but the company’s live streaming ambitions are a different beast. “It’s a completely different encoding pipeline than what we’ve had for VOD,” Aaron said. “For gaming, you have to do it in less than 100 milliseconds. But for live streaming, the goal is to get close to the VOD quality.”
Essentially, Aaron and her team have to figure out how to tweak their entire workflow to get to a point where they can apply different encoding settings to different shots in a matter of seconds. This is likely going to be a multiyear effort, and one that will give Aaron plenty of opportunities to obsess over the tiniest details — for instance, the details of the glitzy SAG Awards red carpet.
All images used in the articles published by Kushal Bharat Tech News are the property of Verge. We use these images under proper authorization and with full respect to the original copyright holders. Unauthorized use or reproduction of these images is strictly prohibited. For any inquiries or permissions related to the images, please contact Verge directly.
Latest News