3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

Published 28 Nov 2017 in cs.CV | (1711.10275v1)

Abstract: Convolutional networks are the de-facto standard for analyzing spatio-temporal data such as images, videos, and 3D shapes. Whilst some of this data is naturally dense (e.g., photos), many other data sources are inherently sparse. Examples include 3D point clouds that were obtained using a LiDAR scanner or RGB-D camera. Standard "dense" implementations of convolutional networks are very inefficient when applied on such sparse data. We introduce new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and use them to develop spatially-sparse convolutional networks. We demonstrate the strong performance of the resulting models, called submanifold sparse convolutional networks (SSCNs), on two tasks involving semantic segmentation of 3D point clouds. In particular, our models outperform all prior state-of-the-art on the test set of a recent semantic segmentation competition.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (1,406)

View on Semantic Scholar

Summary

The paper introduces novel submanifold sparse convolutional operations that maintain sparsity in 3D data and enhance segmentation accuracy.
The method utilizes specialized hash tables and rule books to optimize computation on sparse inputs, achieving an Average IoU of 85.98% on the ShapeNet dataset.
The approach offers significant potential for applications in robotics, medical imaging, and autonomous driving where efficient sparse data processing is essential.

3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

Overview

The paper "3D Semantic Segmentation with Submanifold Sparse Convolutional Networks" by Benjamin Graham, Martin Engelcke, and Laurens van der Maaten, presents a novel approach to address the inefficiency of traditional convolutional networks when processing spatially-sparse data. This novel approach involves the introduction of Submanifold Sparse Convolutional Networks (SSCNs), which are specifically tailored to handle sparse data effectively. The primary application demonstrated in the paper is the semantic segmentation of 3D point clouds.

Key Contributions

Novel Sparse Convolutional Operations: The authors introduce two new convolutional operations tailored for sparse data: Sparse Convolution (SC) and Submanifold Sparse Convolution (SSC). These operations significantly reduce the computational overhead typically associated with high-dimensional, sparse datasets by restricting the computation to non-zero values.
Submanifold Sparse Convolutional Networks (SSCNs): SSCNs maintain the data's inherent sparsity throughout the network layers. Unlike traditional convolutions that expand the number of non-zero sites with each layer, SSCs keep the sparse structure intact which in turn optimizes the computational efficiency.
Implementation and Performance: Implementation details are provided, emphasizing the efficiency gains from using hash tables for active sites and specialized rule books for convolutions. This framework achieves strong results on the ShapeNet dataset for 3D segmentation, outperforming state-of-the-art methods.

Numerical Results and Comparative Analysis

A notable competitive performance metric is the Average Intersection-over-Union (IoU) on the ShapeNet part-segmentation dataset. The results in Table \ref{tab:seg_results} underscore the superiority of SSCNs over the alternative methods:

NN matching with Chamfer distance: 77.57%
Synchronized Spectral CNN: 84.74%
Pd-Network: 85.49%
Densely Connected PointNet: 84.32%
PointCNN: 82.29%
Submanifold SparseConvNet: 85.98%

This performance not only demonstrates the accuracy of SSCNs but also their capability to handle high-dimensional sparse data more effectively than dense networks.

Implications and Future Directions

The implications of SSCNs extend beyond semantic segmentation to any domain involving high-dimensional, sparse input data, such as medical imaging, robotics, and autonomous driving. The efficiency gains due to sparse computation can be critical where computational resources are limited or where processing large volumes of data in real-time is essential.

The paper opens several avenues for future research. One direction could be the exploration of SSCNs in various architectures beyond segmentation, such as object detection and classification in 3D spaces. Another interesting line of work might investigate hybrid approaches combining SSCs with traditional dense layers for tasks requiring both local and global feature representations.

Conclusion

In sum, "3D Semantic Segmentation with Submanifold Sparse Convolutional Networks" makes compelling advancements in handling and processing sparse data via novel convolutional operations and network architectures. This work sets a new benchmark for semantic segmentation tasks and provides a solid foundation for future exploration to further optimize and expand the application of sparse convolutional networks in other domains.

Markdown Report Issue