Learned Video Compression (1811.06981v1)

Published 16 Nov 2018 in eess.IV, cs.CV, cs.LG, and stat.ML

Abstract: We present a new algorithm for video coding, learned end-to-end for the low-latency mode. In this setting, our approach outperforms all existing video codecs across nearly the entire bitrate range. To our knowledge, this is the first ML-based method to do so. We evaluate our approach on standard video compression test sets of varying resolutions, and benchmark against all mainstream commercial codecs, in the low-latency mode. On standard-definition videos, relative to our algorithm, HEVC/H.265, AVC/H.264 and VP9 typically produce codes up to 60% larger. On high-definition 1080p videos, H.265 and VP9 typically produce codes up to 20% larger, and H.264 up to 35% larger. Furthermore, our approach does not suffer from blocking artifacts and pixelation, and thus produces videos that are more visually pleasing. We propose two main contributions. The first is a novel architecture for video compression, which (1) generalizes motion estimation to perform any learned compensation beyond simple translations, (2) rather than strictly relying on previously transmitted reference frames, maintains a state of arbitrary information learned by the model, and (3) enables jointly compressing all transmitted signals (such as optical flow and residual). Secondly, we present a framework for ML-based spatial rate control: namely, a mechanism for assigning variable bitrates across space for each frame. This is a critical component for video coding, which to our knowledge had not been developed within a machine learning setting.

Citations (216)

View on Semantic Scholar

Summary

The paper presents an ML-trained video compression algorithm that achieves up to 60% smaller code sizes for SD and 35% for HD content.
It leverages advanced motion estimation and spatial rate control to minimize compression artifacts and optimally allocate bitrate across frames.
The approach enhances visual quality and real-time low-latency streaming potential, signaling a shift towards ML-driven video technologies.

An Analysis of "Learned Video Compression"

The paper "Learned Video Compression" by Rippel et al. explores the domain of ML for video coding, presenting a novel algorithm that is competitive with traditional video codecs in low-latency mode. The research underscores a paradigm shift in video compression through the application of deep learning, offering significant gains in efficiency as evaluated against established codecs such as HEVC/H.265, AVC/H.264, and VP9.

Overview of Contributions

The core contribution of this work is an ML-trained video compression algorithm operating with high efficiency in a low-latency mode. The results show considerable improvements in compression rates, achieving up to 60% smaller code sizes for standard-definition (SD) content and up to 35% for high-definition (HD) content, compared to prominent codecs. The approach alleviates common compression artifacts like blocking and pixelation, thereby enhancing visual output.

Two major innovations set this research apart:

A novel architecture allows more comprehensive motion estimation, leveraging ML to predict and compensate for complex temporal patterns, beyond mere translations.
A method for spatial rate control is introduced, which permits variable bit allocation across frames—this represents a significant advancement as it is critical for video compression and is uncharted in the ML context.

Architectural Advancements

The algorithm capitalizes on recent advancements in deep learning, akin to strides in image compression. Its architecture generalizes motion compensation and facilitates the propagation of learned states beyond pixel-space reference frames. This enables the handling of intricate spatiotemporal dynamics such as out-of-plane rotations and complex motion backgrounds. The framework integrates enhancements paramount for comprehending and compensating video redundancy, often lost in conventional codecs relying on block-matching strategies.

Moreover, the algorithm applies joint compression techniques for optical flow and residuals—a remarkable change allowing the dynamic distribution of bitrate budgets across different frames, optimizing for higher fidelity.

Theoretical and Practical Implications

Practically, the proposed ML-based approach signifies a step forward in addressing diverse video use cases that traditional codecs struggle with, including virtual reality and social media applications. Theoretically, this represents a substantial use case for adaptive, learned behaviors in systems traditionally driven by preconceived algorithms, thereby expanding the potential of ML in computer vision applications.

Across the landscape of video traffic, where demand and heterogeneity rise continuously, an ML-driven compression facilitator offers notable benefits. This includes reducing bandwidth usage and aligning resources more effectively, which is crucial given video dominance in internet data consumption.

Speculations on Future Developments

Looking ahead, this research prompts a broader adoption of ML frameworks in real-time video streaming and low-latency applications. Future avenues might explore real-time implementations and extending capabilities to B-frame coding, potentially elevating it to more general-use video compression settings.

Enhanced computational efficiencies are essential for real-time applicability, suggesting further optimizations and deployment on specialized hardware accelerators. As ML models for codec improvement evolve, there is potential to further push the boundaries of video compression beyond present capabilities.

In conclusion, the paper offers a robust framework for leveraging ML in video compression, potentially catalyzing progressive models in the pursuit of more efficient and visually superior coding methodologies. The implications are far-reaching, both in academia and industry, as ML becomes an increasingly vital component in emerging video technologies.