Sparsity Invariant CNNs (1708.06500v2)

Published 22 Aug 2017 in cs.CV

Abstract: In this paper, we consider convolutional neural networks operating on sparse inputs with an application to depth upsampling from sparse laser scan data. First, we show that traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. To overcome this problem, we propose a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation. We demonstrate the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches. Compared to dense baselines, the proposed sparse convolution network generalizes well to novel datasets and is invariant to the level of sparsity in the data. For our evaluation, we derive a novel dataset from the KITTI benchmark, comprising 93k depth annotated RGB images. Our dataset allows for training and evaluating depth upsampling and depth prediction techniques in challenging real-world settings and will be made available upon publication.

Authors (6)

Jonas Uhrig (4 papers)
Nick Schneider (9 papers)
Lukas Schneider (18 papers)
Uwe Franke (14 papers)
Thomas Brox (134 papers)
Andreas Geiger (136 papers)

Citations (786)

View on Semantic Scholar

Summary

The paper introduces a sparse convolution layer that adapts weights based on valid input data to improve depth upsampling performance.
The paper constructs a large-scale dense depth dataset with 93k images, offering a robust benchmark for training and evaluation.
The paper demonstrates that its method generalizes across various sparsity levels, outperforming traditional CNNs and RGB-guided techniques.

Sparsity Invariant CNNs

The paper under review, titled "Sparsity Invariant CNNs," presents a new approach to handling sparse data within Convolutional Neural Networks (CNNs). The work addresses a persistent issue in computer vision and robotic perception: depth upsampling from sparse laser scan data, a common scenario in autonomous driving and other robotics applications.

Summary of Contributions

The paper makes several notable contributions:

Sparse Convolution Layer: The authors introduce a novel sparse convolution layer that explicitly considers the location of missing data during the convolution operation. This layer adjusts the convolution weights based on the validity of the input pixels.
Dense Depth Dataset: The researchers developed a large-scale dataset derived from the KITTI benchmark that contains 93k depth-annotated RGB images. This dataset facilitates training and evaluating depth upsampling techniques and will be made publicly available.

Key Findings and Results

Performance on Sparse Inputs:

Standard convolutional networks perform suboptimally on sparse data, even when the location of missing data is known. The proposed sparse convolution network outperforms traditional convolutional networks by explicitly handling sparsity.
Experiments conducted on the Synthia dataset demonstrate that the sparse convolutional network generalizes well across different sparsity levels, whereas traditional CNNs fail when evaluated on sparsity levels different from those see during training (Figure 2).

Generalization and Robustness:

The sparse convolution network maintains performance across various datasets and levels of sparsity, showcasing its robustness. This is critical for applications like robotics where sensor configurations can change (Table 1).

Comparison with State-of-the-Art Techniques:

The proposed method outperforms several depth upsampling and guided filtering techniques that rely on RGB guidance. This indicates that the sparse convolution layer effectively leverages sparse depth information without requiring additional guidance from RGB data (Table 2).

Implications and Future Work

Practical Implications:

The ability to handle sparse data effectively without requiring preprocessing steps such as interpolation or reliance on dense data opens up new possibilities for perception in robotics and autonomous systems. The sparse convolution layer can be integrated into existing CNN architectures to improve their performance on sparse input data.

Theoretical Implications:

This work challenges the traditional approach of handling sparse data within CNNs by introducing a convolution operation that normalizes and adjusts based on data validity. This ensures that the learned features are invariant to the degree of input sparsity, which contrasts with the common practice of padding or using default values.

Speculation on Future Developments:

The sparse convolution concept can be extended to other CNN architectures and tasks beyond depth upsampling, including semantic segmentation and 3D object detection. Future research could investigate the integration of sparse convolution layers with network compression techniques to enhance computational efficiency.
Another promising direction is the combination of this approach with advanced data augmentation techniques to further improve the robustness and generalization capabilities of vision models operating in varied and unstructured environments.

Dataset and Experimental Validation

The paper introduces a substantial advancement in the form of a newly annotated large-scale dataset. The KITTI-derived dataset provides robust and semi-dense depth annotations over 93k images, offering a valuable resource for the research community. This dataset is particularly important because it allows for the extensive evaluation of depth prediction techniques in realistic settings, which has been a limitation with smaller and less diverse datasets.

The dataset validation results underscore its quality, demonstrating high accuracy and significant density increase compared to the raw LiDaR scans. The qualitative and quantitative comparisons provided affirm that the dataset is reliable and can support the development of more accurate and generalizable depth prediction models (Figures and Tables in the supplementary material).

Conclusion

Overall, the paper presents a significant contribution to the field of computer vision by introducing a novel method for handling sparse data within convolutional neural networks and providing a large-scale, high-quality dataset. The sparse convolution layers developed in this research show promise in enhancing the performance and robustness of perception systems in robotics. By making their dataset publicly available, the authors also contribute valuable resources to the community, potentially driving further advancements in depth perception and related tasks.

PDF Markdown