Emergent Mind

Towards Extreme Image Compression with Latent Feature Guidance and Diffusion Prior

(2404.18820)
Published Apr 29, 2024 in eess.IV and cs.CV

Abstract

Compressing images at extremely low bitrates (below 0.1 bits per pixel (bpp)) is a significant challenge due to substantial information loss. Existing extreme image compression methods generally suffer from heavy compression artifacts or low-fidelity reconstructions. To address this problem, we propose a novel extreme image compression framework that combines compressive VAEs and pre-trained text-to-image diffusion models in an end-to-end manner. Specifically, we introduce a latent feature-guided compression module based on compressive VAEs. This module compresses images and initially decodes the compressed information into content variables. To enhance the alignment between content variables and the diffusion space, we introduce external guidance to modulate intermediate feature maps. Subsequently, we develop a conditional diffusion decoding module that leverages pre-trained diffusion models to further decode these content variables. To preserve the generative capability of pre-trained diffusion models, we keep their parameters fixed and use a control module to inject content information. We also design a space alignment loss to provide sufficient constraints for the latent feature-guided compression module. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in terms of both visual performance and image fidelity at extremely low bitrates.

Pipeline of DiffEIC showing stages of latent feature-guided compression and conditional diffusion decoding for image reconstruction.

Overview

  • The paper introduces a new method for extreme image compression below 0.1 bits per pixel, utilizing a combination of compressive Variational Autoencoders and diffusion models, focusing on advanced image reconstruction and content preservation.

  • It demonstrates enhanced performance in maintaining image quality and fidelity, outperforming existing methods through empirical validation with standard datasets like Kodak and Tecnick.

  • Future approaches could include integrating text-to-image features of diffusion models and improving efficiency to accommodate real-time applications.

Proposed Techniques for Extreme Image Compression Utilizing Latent Feature Guidance and Diffusion Models

Introduction to Extreme Image Compression

Image compression, an essential procedure for efficient data transmission and storage, has seen practical deployment through various standards like JPEG2000 and VVC. However, these conventional methods falter at extremely low bitrates by producing visually unappealing compression artifacts or overly smooth images. Addressing this challenge, recent advancements in deep learning have steered toward leveraging generative models to enhance compression at low bitrates significantly.

Methodology

This paper introduces a nuanced method for image compression at below 0.1 bits per pixel (bpp), leveraging compressive Variational Autoencoders (VAEs) and pre-trained diffusion models infused with external content-guided modulation. This hybrid approach comprises two main components:

  1. Latent Feature-Guided Compression Module (LFGCM): Utilizing compressive VAEs, this module initially encodes and compresses input images into content variables, preparing them for subsequent decoding. It introduces external guidance to align these variables better with the diffusion spaces, employing transform coding paradigms for initial data reduction.
  2. Conditional Diffusion Decoding Module (CDDM): This module decodes content variables into images, employing a pre-trained stable diffusion model fixed during training to leverage its powerful generative properties for better image reconstruction. It innovatively injects content attributes through trainable control modules, refining output quality.

Empirical Validation

In evaluating this model, extensive experiments across standard datasets like Kodak, Tecnick, and CLIC2020 have demonstrated superior performance over existing methods, particularly in preserving perceptual quality and fidelity at extremely low bitrates. Notably, the method described outperforms contemporary approaches in terms of visual performance metrics such as LPIPS, FID, and KID, particularly excelling in scenarios where the bit-rate constraints are stringent.

  • Quantitative Performance: The proposed method markedly enhances bitrate savings while maintaining compelling image quality, signifying improvements over both traditional codecs and recent deep learning-based methods.
  • Qualitative Assessments: Visual comparisons further substantiate the quantitative findings, with the proposed method consistently delivering visually pleasing and detailed reconstructions even at bitrates lower than 0.1 bpp.

Future Perspectives

The fusion of deep learning models for compression, specifically utilizing the generative prowess of diffusion models, marks a promising advancement in the realm of image and video codecs. Future studies might explore:

  • Further integration with text-to-image capabilities of diffusion models to enhance semantic fidelity.
  • Reduction of computational demand and inference time to adapt this methodology for broader, real-time applications.

Conclusion

This research delineates a novel framework for extreme image compression using a combination of compressive autoencoders and diffusion-based decoders enhanced by latent feature guidance. By setting new benchmarks in visual and quantitative metrics at ultra-low bitrates, it paves the way for future developments in efficient and high-quality image compression technologies.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.