Emergent Mind

InstructIR: High-Quality Image Restoration Following Human Instructions

(2401.16468)
Published Jan 29, 2024 in cs.CV , cs.LG , and eess.IV

Abstract

Image restoration is a fundamental problem that involves recovering a high-quality clean image from its degraded observation. All-In-One image restoration models can effectively restore images from various types and levels of degradation using degradation-specific information as prompts to guide the restoration model. In this work, we present the first approach that uses human-written instructions to guide the image restoration model. Given natural language prompts, our model can recover high-quality images from their degraded counterparts, considering multiple degradation types. Our method, InstructIR, achieves state-of-the-art results on several restoration tasks including image denoising, deraining, deblurring, dehazing, and (low-light) image enhancement. InstructIR improves +1dB over previous all-in-one restoration methods. Moreover, our dataset and results represent a novel benchmark for new research on text-guided image restoration and enhancement. Our code, datasets and models are available at: https://github.com/mv-lab/InstructIR

Overview

  • Introduces InstructIR, a method using human-written instructions for image restoration.

  • Employs natural language processing for task execution based on user prompts.

  • Highlights InstructIR's ability to handle multiple types of image degradations.

  • Reports an improvement of +1dB over prior methods across benchmark tasks.

  • Showcases the potential for more user-friendly, AI-driven image restoration tools.

Overview

This paper introduces InstructIR, an innovative approach to image restoration that leverages human-written instructions as a guiding mechanism. Unlike traditional models that either address specific types of degradations or handle various degradations through pre-defined guidance vectors, InstructIR takes advantage of natural language processing to understand and execute restoration tasks described in natural language instructions. Through a robust set of experiments, the researchers validate the efficacy of employing text guidance for image restoration, with InstructIR setting new benchmarks across multiple restoration tasks.

Methodology

The work explores the intersection of image restoration and instruction-based guidance. The authors propose a language-informed algorithm that can interpret human-written instructions to perform complex restoration tasks on degraded images. At the core of InstructIR is the usage of a text encoder—such as a sentence transformer—that captures the semantics of user prompts and translates them into an embedding space that the image restoration model can understand.

The research makes a significant contribution by demonstrating that a single InstructIR model, powered by NAFNet's efficient architecture, can simultaneously address various restoration tasks, such as denoising, deraining, deblurring, dehazing, and low-light enhancement. The underlying method treats instruction-based image restoration as a supervised learning problem, where over 10,000 diverse prompts are first generated using GPT-4 and paired with corresponding degraded images to form a robust training dataset.

Results

Empirical results indicate that InstructIR surpasses state-of-the-art benchmarks on different image restoration tasks. An improvement of +1dB over previous all-in-one restoration methods is reported, demonstrating the model's ability to process complex, multi-degradation problems effectively. InstructIR's flexibility is also showcased as it caters to the restoration needs prescribed explicitly by end-users through arbitrary instructions.

Implications and Conclusion

The significance of InstructIR lies not only in its performance but also in the paradigm shift it introduces in user interaction with restoration models. The model interprets a vast range of instructions, offering an intuitive interface for non-experts to achieve desired restoration outcomes. By releasing the dataset and articulating a new benchmark for text-guided image restoration, this research paves the way for subsequent exploration and development in the area.

In conclusion, the paper describes a critical advance in leveraging human guidance via natural language prompts to facilitate the challenging task of image restoration. By demonstrating remarkable performance across several benchmark tasks, InstructIR exemplifies the promising synthesis of language understanding and visual data processing, heralding a future where AI-driven image restoration becomes more accessible and user-friendly.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.