Deep Bilateral Learning for Real-Time Image Enhancement (1707.02880v2)

Published 10 Jul 2017 in cs.GR and cs.CV

Abstract: Performance is a critical challenge in mobile image processing. Given a reference imaging pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements and enable real-time evaluation. For this, we introduce a new neural network architecture inspired by bilateral grid processing and local affine color transforms. Using pairs of input/output images, we train a convolutional neural network to predict the coefficients of a locally-affine model in bilateral space. Our architecture learns to make local, global, and content-dependent decisions to approximate the desired image transformation. At runtime, the neural network consumes a low-resolution version of the input image, produces a set of affine transformations in bilateral space, upsamples those transformations in an edge-preserving fashion using a new slicing node, and then applies those upsampled transformations to the full-resolution image. Our algorithm processes high-resolution images on a smartphone in milliseconds, provides a real-time viewfinder at 1080p resolution, and matches the quality of state-of-the-art approximation techniques on a large class of image operators. Unlike previous work, our model is trained off-line from data and therefore does not require access to the original operator at runtime. This allows our model to learn complex, scene-dependent transformations for which no reference implementation is available, such as the photographic edits of a human retoucher.

Authors (5)

Jiawen Chen (24 papers)
Jonathan T. Barron (89 papers)
Samuel W. Hasinoff (2 papers)
Michaël Gharbi (17 papers)
Frédo Durand (16 papers)

Citations (698)

View on Semantic Scholar

Summary

The paper demonstrates a dual-stream CNN that predicts local affine coefficients using bilateral grid processing for efficient image enhancement.
It processes 1080p images at 50 FPS on a Google Pixel phone, achieving PSNR scores of 28.8 dB for HDR+ and 33.5 dB for Local Laplacian filtering.
The architecture bridges traditional image processing with deep learning to enable advanced, high-fidelity mobile photography.

Deep Bilateral Learning for Real-Time Image Enhancement

The paper "Deep Bilateral Learning for Real-Time Image Enhancement" presents a novel neural network architecture aimed at enabling sophisticated image enhancements to be processed in real-time on mobile devices. Authored by Michaël Gharbi et al., this work facilitates the acceleration of image processing pipelines, making high-resolution image processing both efficient and accessible on contemporary smartphones.

Network Architecture

The proposed architecture is inspired by bilateral grid processing and local affine color transforms. It utilizes a convolutional neural network (CNN) to predict local affine transformation coefficients within a bilateral grid. A critical innovation is the network's ability to perform much of its heavy computation on a low-resolution version of the input image while producing high-quality, full-resolution outputs via a novel edge-preserving upsampling technique called slicing.

The architecture is segmented into two primary streams:

Low-Resolution Stream: Processes a downsampled version of the input image to predict a bilateral grid of affine coefficients.
High-Resolution Stream: Utilizes a guidance map to apply these coefficients in an edge-aware manner to the full-resolution image.

This dual-stream approach leverages the efficiency of low-resolution processing while maintaining the precision of high-resolution output.

Performance and Results

One of the standout achievements of this work is the ability to process high-resolution images at 1080p in real-time (50 FPS) on a Google Pixel phone. This performance is facilitated by the introduction of two novel layers within the network:

Slicing Node: Allows for data-dependent lookups in the bilateral grid, ensuring edge-aware upsampling.
Multiplicative Node: Enables the application of local affine color transformations predicted by the network.

The network was trained on various datasets, including the MIT-Adobe "FiveK" dataset, to learn a variety of photographic effects. The results demonstrate that the model can effectively reproduce both algorithmic image operators (e.g., HDR+, Local Laplacian filter) and subjectively annotated human retouching with high fidelity.

Numerical Evaluation

The authors provide comprehensive numerical evaluations, demonstrating the model’s competitive performance against existing techniques like Bilateral Guided Upsampling (BGU) and Transform Recipes (TR):

The model achieved PSNR scores of 28.8 dB for HDR+ and 33.5 dB for Local Laplacian filtering, showcasing its ability to handle complex transformations efficiently.
In user-style learning tasks, the model outperforms methods by Yan et al. and Hwang et al., with lower mean L2 error in L*a*b* space.

Implications and Future Directions

Practically, this work opens new possibilities for mobile photography, allowing sophisticated post-processing techniques to be executed in real-time. The implications extend to numerous applications, including computational photography, mobile photo editing, and adaptive graphics rendering. Theoretically, it introduces an efficient hybrid approach that combines traditional image processing paradigms with contemporary deep learning techniques.

Future developments could include:

Extending the model to handle spatial warping and edge generation.
Exploring applications beyond static image enhancement, such as real-time video processing.
Further reducing computational overhead and memory usage to accommodate even more resource-constrained environments.

Overall, the architecture presents an advanced solution for real-time image enhancement, demonstrating substantial strides in both efficiency and quality of computational photography on mobile devices. The open-sourcing of this model would likely stimulate further research and practical application in the field, fostering advancements and potential optimizations.

PDF Markdown