Full Resolution Image Compression with Recurrent Neural Networks (1608.05148v2)

Published 18 Aug 2016 in cs.CV

Abstract: This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a neural network for entropy coding. We compare RNN types (LSTM, associative LSTM) and introduce a new hybrid of GRU and ResNet. We also study "one-shot" versus additive reconstruction architectures and introduce a new scaled-additive framework. We compare to previous work, showing improvements of 4.3%-8.8% AUC (area under the rate-distortion curve), depending on the perceptual metric used. As far as we know, this is the first neural network architecture that is able to outperform JPEG at image compression across most bitrates on the rate-distortion curve on the Kodak dataset images, with and without the aid of entropy coding.

Citations (793)

View on Semantic Scholar

Summary

The paper presents novel RNN architectures using innovative reconstruction methods to achieve variable rate image compression, outperforming traditional JPEG benchmarks.
The methodology employs recurrent encoder-decoder pairs, including GRU and residual variants, combined with binarization and entropy coding to efficiently process full-resolution images.
Experiments on the Kodak dataset demonstrate significant improvements in MS-SSIM and PSNR-HVS metrics, highlighting both the practical and theoretical impact of the approach.

Full Resolution Image Compression with Recurrent Neural Networks

Overview

The paper "Full Resolution Image Compression with Recurrent Neural Networks" introduces a suite of architectures for lossy image compression utilizing neural networks. Focusing on the goal of achieving variable compression rates without necessitating retraining, the paper leverages recurrent neural networks (RNNs) to encode and decode images. The innovative frameworks proposed in this work include hybrid RNN variants such as a combination of Gated Recurrent Units (GRU) and ResNet, and the development of a novel scaled-additive reconstruction strategy.

The architectures evaluated demonstrate performance gains compared to traditional JPEG compressions, marking a significant stride in neural network-based image compression, particularly when examined on the rate-distortion curve across multiple bitrates on the Kodak dataset.

Methodology

The methodology revolves around using RNN-based encoder-decoder pairs augmented by additional components: a binarizer for converting encoded representations into binary codes and a neural network for entropy coding. This combination enables the handling of full-resolution images without retraining for different compression rates.

Architectures and Comparisons

Several recurrent units were explored:

Long Short-Term Memory (LSTM)
Associative LSTM
Gated Recurrent Units (GRU)
A residual variant of GRU inspired by ResNet and Highway Networks

Different reconstruction frameworks were implemented:

One-Shot Reconstruction
Additive Reconstruction
Residual Scaling

The efficacy of these architectures was evaluated on a set of metrics, primarily Multi-Scale Structural Similarity (MS-SSIM) and Peak Signal to Noise Ratio - Human Visual System (PSNR-HVS). The experimental results indicated the superiority of some models over others, often contingent on the data set used for training.

Results

The paper reports notable numerical improvements in terms of the Area Under the Curve (AUC) for rate-distortion characteristics:

On training with a dataset of 32x32 image patches, GRU with one-shot reconstruction emerged as the best performer in both MS-SSIM (AUC: 1.8098) and PSNR-HVS (AUC: 53.15).
When trained on a high entropy dataset (HE), the Residual GRU (one-shot) achieved the highest PSNR-HVS (53.19), while LSTM (one-shot) achieved the highest MS-SSIM (AUC: 1.8166).

Additionally, invoking an entropy coding layer, termed BinaryRNN, further optimized the compressed binary codes, leading to significant gains in compression efficiency, particularly at higher resolutions.

Implications

Practical Implications

The advancements presented by this paper have direct applications in both consumer and enterprise image storage and transmission. The ability to compress images more effectively than JPEG with variable rates using a single trained network could revolutionize image compression standards, making deployment more flexible and efficient.

Theoretical Implications

The use of hybrid RNN architectures and novel reconstruction strategies expands the boundaries of neural network applications in data compression. Additionally, the findings underpin the importance of dataset composition, highlighting the need for training on high-entropy datasets to achieve robust compression performance.

Future Developments

Future work in this area could extend to:

Jointly training the image encoder and entropy coder to balance encoder precision and entropy coder’s predictive power.
Exploring video compression techniques like leveraging patches from decoded frames, which can further enhance performance on high-resolution images.
Integrating advanced perceptual metrics directly into the loss functions for optimizing image quality more closely aligned with human visual perception.

Conclusion

"Full Resolution Image Compression with Recurrent Neural Networks" demonstrates substantial progress in neural network-based image compression, surpassing traditional JPEG compression across a range of metrics. The proposed architectures and methodologies offer both practical improvements in image storage and theoretical advancements in the field of data compression, setting a foundation for future research into more efficient and perceptually coherent image compression techniques.

This concise and detailed summary provides experienced researchers with a rigorous overview of the paper’s contributions, methodologies, and findings, while also speculating on the potential future directions in the field.

PDF Markdown