SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans (1912.00036v2)

Published 29 Nov 2019 in cs.CV

Abstract: We present a novel approach that converts partial and noisy RGB-D scans into high-quality 3D scene reconstructions by inferring unobserved scene geometry. Our approach is fully self-supervised and can hence be trained solely on real-world, incomplete scans. To achieve self-supervision, we remove frames from a given (incomplete) 3D scan in order to make it even more incomplete; self-supervision is then formulated by correlating the two levels of partialness of the same scan while masking out regions that have never been observed. Through generalization across a large training set, we can then predict 3D scene completion without ever seeing any 3D scan of entirely complete geometry. Combined with a new 3D sparse generative neural network architecture, our method is able to predict highly-detailed surfaces in a coarse-to-fine hierarchical fashion, generating 3D scenes at 2cm resolution, more than twice the resolution of existing state-of-the-art methods as well as outperforming them by a significant margin in reconstruction quality.

Citations (134)

View on Semantic Scholar

Summary

The paper introduces a novel self-supervised learning framework that predicts complete 3D scenes from partial RGB-D scans.
It leverages a sparse TSDF representation and a hierarchical network design to achieve high-resolution reconstructions at a 2cm voxel resolution.
Evaluation on the Matterport3D dataset shows significant error reduction and improved performance in near-surface and unseen space metrics.

Sparse Generative Neural Networks for Self-Supervised Scene Completion

The paper "SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans" presents a novel method for reconstructing high-resolution 3D scenes from partial RGB-D scans using sparse generative neural network architectures. This work focuses on self-supervised learning, eliminating the dependency on synthetic datasets by training directly on real-world, incomplete scan data.

Core Contributions

Self-Supervised Learning Framework: The authors introduce a self-supervised learning approach that utilizes partial scans. The training process leverages input-target pairs consisting of less and more incomplete versions of the scans by removing frames to increase partialness. The self-supervision is achieved by correlating levels of incompleteness and masking out unobserved regions, allowing for the prediction of unseen geometries without complete ground truth scans.
Sparse Generative Neural Network (SG-NN): The SG-NN architecture is designed to handle large-scale 3D data efficiently by exploiting sparsity. Unlike traditional volumetric approaches, SG-NN operates on a sparse TSDF (Truncated Signed Distance Function), progressing through learning from coarse representations to finer detail predictions. This allows processing at a 2cm voxel resolution, surpassing the state-of-the-art models that handle up to 5cm resolution.
Generative Model for High-Resolution Output: The architecture employs a hierarchical, progressively growing design. Starting with a low-resolution output, successive hierarchical levels predict finer geometries, ensuring high-quality surface generation, a critical factor for applications in augmented reality (AR), virtual reality (VR), and robotics.

Key Findings and Results

The paper demonstrates the effectiveness of SG-NN in self-supervised settings using Matterport3D dataset. Evaluation results indicate that SG-NN achieves a significant reduction in $\ell_1$ error across metrics measuring different aspects of scene quality, particularly outperforming competitors in unseen space and near-surface metrics. In supervised scenarios with synthetic data possessing full ground truth, SG-NN remains competitive, further validating its design efficiency.

The experimental validation highlights SG-NN's advantage in predicting more complete geometry than present in training scans (owing to occlusion and sensor limitations). This robust predictive power stems from exploiting patterns in incompleteness common across the training dataset, surpassing traditional supervised methodologies that rely on synthetic data for fully observed scenes.

Implications and Future Prospects

Practically, the ability to predict complete scene geometry from incomplete scans without requiring synthetic training data presents a game-changing advantage. This could reduce the reliance on costly and sometimes inaccurate synthetic data, facilitating more seamless domain transfer to real-world applications. Moreover, the sparse generative approach opens up avenues for real-time scene reconstruction, crucial for robotics and interactive digital environments.

Theoretically, this work enhances understanding of self-supervised learning in 3D scene completion. Future research could explore the extension of such sparse generative techniques to color and semantic information completion, providing a more holistic scene understanding. Additionally, adaptations integrating semantic segmentation with geometric completion could lead to better object-scene interaction models, auguring advancements in robotic perception and autonomous systems navigation.

Conclusion

The SG-NN framework represents a substantial advancement in 3D scene reconstruction technologies. Its self-supervised, sparse generative approach sidesteps the pitfalls associated with synthetic data reliance, making a notable contribution to the fields of computer vision and spatial understanding in AI systems. As the field progresses, the insights and methodologies presented here could lead to broader applications in various commercial and research domains. Such innovations promise to refine how AI interprets and interacts with the complex world of three-dimensional data.

PDF Markdown

Related Papers

YouTube

Show All Videos