Learning-Based View Synthesis for Light Field Cameras (1609.02974v1)

Published 9 Sep 2016 in cs.CV and cs.GR

Abstract: With the introduction of consumer light field cameras, light field imaging has recently become widespread. However, there is an inherent trade-off between the angular and spatial resolution, and thus, these cameras often sparsely sample in either spatial or angular domain. In this paper, we use machine learning to mitigate this trade-off. Specifically, we propose a novel learning-based approach to synthesize new views from a sparse set of input views. We build upon existing view synthesis techniques and break down the process into disparity and color estimation components. We use two sequential convolutional neural networks to model these two components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images. We show the performance of our approach using only four corner sub-aperture views from the light fields captured by the Lytro Illum camera. Experimental results show that our approach synthesizes high-quality images that are superior to the state-of-the-art techniques on a variety of challenging real-world scenes. We believe our method could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.

Citations (658)

View on Semantic Scholar

Summary

The paper introduces a novel two-stage CNN approach that combines disparity estimation and color prediction to synthesize new views from sparse light field images.
It leverages deep learning to predict dense disparity maps without explicit ground truth, achieving superior PSNR and SSIM results in challenging scenes.
This efficient method paves the way for enhanced consumer light field camera designs by increasing spatial resolution and improving overall image quality.

Learning-Based View Synthesis for Light Field Cameras

The paper "Learning-Based View Synthesis for Light Field Cameras" presents a method to enhance the angular and spatial resolution trade-off inherent in consumer light field cameras by utilizing machine learning techniques. The authors propose a novel approach to synthesize new views from a sparse set of input images, specifically utilizing convolutional neural networks (CNNs) to perform disparity and color estimation.

Methodology

The proposed technique is structured around two primary components: the disparity estimator and the color predictor. This two-stage approach leverages the success of deep learning in computer vision, addressing the difficulty of directly training a single end-to-end network by breaking the process into more manageable sub-components.

Disparity Estimation: The disparity estimator uses a CNN to predict a dense disparity map at the novel view position. Input features include the mean and standard deviation computed from warped input images at various disparity levels. The network is trained without explicit ground truth disparities, instead minimizing the synthesis error directly, thereby aligning the disparity estimation with the final view synthesis task.
Color Prediction: The estimated disparity map is utilized to warp the input images to the desired novel view. The color predictor CNN then synthesizes the final image from these warped views, integrating additional inputs such as the disparity map and the view's position. This component learns to handle occlusions and other complexities inherent in the task, accounting for nuances not easily captured by traditional methods.

Experimental Findings

Testing and validation of this approach were conducted using input from Lytro Illum light field cameras, utilizing only the four corner sub-aperture images to synthesize views within an 8x8 grid. The authors report that their method achieves superior performance both visually and numerically compared to existing state-of-the-art techniques. In particular, the results show higher PSNR and SSIM values, especially in scenes with challenging occlusions.

Implications and Future Directions

This learning-based approach may significantly impact consumer light field camera design, potentially reducing the required angular resolution and thereby increasing spatial resolution. Additionally, the method’s efficiency—achieving synthesis with a runtime of approximately 12.3 seconds per image—highlights its potential practical applications in improving light field camera usability in both consumer and professional domains.

Future research may explore adaptations of this methodology to handle unstructured light fields with larger disparity ranges, or extend these techniques to real-time applications. Additionally, integrating this view synthesis approach with existing light field compression schemes could enhance data compression efficiency, further broadening the utility of light field cameras.

In conclusion, the paper presents a compelling method that effectively leverages deep learning to overcome traditional limitations in light field view synthesis, marking a significant step toward more accessible and higher-quality light field imaging solutions.

PDF Markdown