Occlusion Aware Unsupervised Learning of Optical Flow (1711.05890v2)

Published 16 Nov 2017 in cs.CV

Abstract: It has been recently shown that a convolutional neural network can learn optical flow estimation with unsupervised learning. However, the performance of the unsupervised methods still has a relatively large gap compared to its supervised counterpart. Occlusion and large motion are some of the major factors that limit the current unsupervised learning of optical flow methods. In this work we introduce a new method which models occlusion explicitly and a new warping way that facilitates the learning of large motion. Our method shows promising results on Flying Chairs, MPI-Sintel and KITTI benchmark datasets. Especially on KITTI dataset where abundant unlabeled samples exist, our unsupervised method outperforms its counterpart trained with supervised learning.

Citations (303)

View on Semantic Scholar

Summary

The paper introduces an unsupervised deep learning method using CNNs that explicitly models occlusion and employs an enhanced warping strategy to effectively handle large motions in optical flow estimation.
Experimental results show the method achieves superior performance over existing unsupervised techniques on benchmark datasets like KITTI, even outperforming some supervised methods.
This unsupervised approach reduces the dependency on costly labeled data and shows potential for application in areas like autonomous systems by improving flow estimation in dynamic environments.
This unsupervised approach reduces the dependency on costly labeled data and shows potential for application in areas like autonomous systems.

Insights into "Occlusion Aware Unsupervised Learning of Optical Flow"

The paper, "Occlusion Aware Unsupervised Learning of Optical Flow," presents an innovative approach to address key challenges in the domain of optical flow estimation via unsupervised learning methods. Optical flow, a crucial problem in computer vision, involves predicting the motion of objects between two consecutive video frames. The research leverages convolutional neural networks (CNNs) to minimize the disparity in performance between unsupervised and supervised optical flow estimation methods, with a specific focus on handling occlusion and large motion scenarios.

Key Contributions

This paper introduces a novel method that explicitly models occlusion and implements an advanced warping approach to manage large motion scenarios, which are significant challenges in unsupervised optical flow estimation. Here are the pivotal contributions of this work:

Occlusion Handling: The authors propose a new architecture that integrates an occlusion prediction mechanism to improve flow estimation accuracy. By using backward optical flow to generate occlusion maps, the network mitigates the impact of occluded pixels on optical flow training, thus enhancing flow prediction in regions traditionally leading to errors.
Enhanced Warping Method: A revised warping strategy features prominently in the methodology, expanding the search space during backward warping. This approach effectively addresses the problem of large motion, allowing the network to better learn by capturing distant pixel correspondences.
Network Modifications: The proposed system extends the existing FlowNetS structure by introducing additional inputs during the decoder phase. This enhances the network's capacity to refine coarse-to-fine motion details, further improving flow estimation.
Preprocessing Techniques: Integrating histogram equalization and channel representations as preprocessing steps result in improved contrast and feature representation, which are beneficial for better flow estimation.

Experimental Results

The effectiveness of the proposed method is validated through experiments on standard optical flow benchmark datasets: Flying Chairs, MPI-Sintel, and KITTI. The performance is quantitatively measured using endpoint error (EPE), where the proposed unsupervised approach exhibits superior performance compared to existing unsupervised methods. Notably, it outperforms some supervised algorithms, particularly on the challenging KITTI dataset, indicating the robustness of the method when trained with abundant unlabeled data.

On the KITTI dataset, the unsupervised network trained without using ground-truth data demonstrates a significant reduction in EPE, outperforming its supervised counterparts. This is particularly notable since KITTI features abundant unlabeled data, allowing the unsupervised method to fully leverage its architectural strengths.

Implications and Future Work

The advancements presented in this paper have substantial implications for both practical optical flow applications and theoretical explorations in unsupervised learning paradigms. Practically, improving unsupervised methods reduces dependency on labeled datasets, which are costly and time-consuming to procure, especially with real-world scenes. Theoretically, this research opens avenues for future enhancements in neural architectures to further close the performance gap between unsupervised and supervised learning.

Future work may explore the integration of additional temporal and spatial cues into the network, potentially improving robustness to occlusion and motion blur even further. Furthermore, expanding these methodologies to more comprehensive 3D motion analysis could engender broader applications in autonomous systems and robotics, particularly in dynamic and complex environments.

In conclusion, this paper represents a significant advancement in unsupervised optical flow estimation, achieving competitive results that challenge conventional supervised methods. The paper underscores the potential of deep neural networks tailored to address specific challenges, such as occlusion and large motion, highlighting a promising direction for ongoing research and development in computer vision.

PDF Markdown