Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit Representation (2210.15858v3)

Published 28 Oct 2022 in cs.CV, cs.GR, and cs.RO

Abstract: In this work, we present a dense tracking and mapping system named Vox-Fusion, which seamlessly fuses neural implicit representations with traditional volumetric fusion methods. Our approach is inspired by the recently developed implicit mapping and positioning system and further extends the idea so that it can be freely applied to practical scenarios. Specifically, we leverage a voxel-based neural implicit surface representation to encode and optimize the scene inside each voxel. Furthermore, we adopt an octree-based structure to divide the scene and support dynamic expansion, enabling our system to track and map arbitrary scenes without knowing the environment like in previous works. Moreover, we proposed a high-performance multi-process framework to speed up the method, thus supporting some applications that require real-time performance. The evaluation results show that our methods can achieve better accuracy and completeness than previous methods. We also show that our Vox-Fusion can be used in augmented reality and virtual reality applications. Our source code is publicly available at https://github.com/zju3dv/Vox-Fusion.

Authors (6)

Xingrui Yang (10 papers)
Hai Li (159 papers)
Hongjia Zhai (7 papers)
Yuhang Ming (16 papers)
Yuqian Liu (8 papers)
Guofeng Zhang (173 papers)

Citations (144)

View on Semantic Scholar

Summary

The paper introduces a novel voxel-based neural implicit framework that fuses SLAM with dynamic scene voxelization to enhance real-time dense tracking.
It leverages a sparse octree structure and signed distance functions for detailed geometric reconstruction and efficient map updates.
Quantitative evaluations on the Replica dataset demonstrate improved accuracy and robust handling of complex trajectories compared to existing methods.

Vox-Fusion: Advancements in Dense Tracking and Mapping through Voxel-Based Neural Implicit Representation

The paper "Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit Representation" presents a comprehensive paper into the integration of neural implicit representation and volumetric SLAM systems for enhanced scene tracking and mapping. Bridging the gap between traditional SLAM techniques and recent advances in neural implicit networks, the proposed system, Vox-Fusion, offers a robust framework for real-time dense SLAM applications. The authors address the limitations of pre-existing systems, such as limited scalability and suboptimal memory usage, by employing a voxel-embedded hierarchical data structure powered by neural networks.

Methodology

Vox-Fusion employs voxel-based neural implicit surfaces which encode and optimize the scene within each voxel, leveraging a sparse octree for dynamic scene subdivision. This architecture supports rapid on-the-fly expansion, allowing mapping of unknown environments without prior scene knowledge— a significant advancement over previous fixed-size grid systems. The core of the computational modeling process is the usage of an implicit surface represented by signed distance functions (SDFs), enabling detailed geometric reconstructions useful for various AR and VR applications.

In the Vox-Fusion system, the global map evolves incrementally through a fusion mechanism, integrating new data from RGB-D frames dynamically. Additionally, the system incorporates a multi-process framework that differentiates between tracking and mapping processes, aiming for both high accuracy in 3D reconstruction and computational efficiency. The key innovation lies in the voxel-based scene representation that allows capturing fine geometric details and efficient handling of real-time mapping challenges, a feat supported by the combination of learned voxel features and computationally efficient Morton coding.

Results and Evaluation

The system was tested on the Replica dataset, demonstrating superior performance in terms of accuracy and reconstruction quality compared to existing methods like iMap and NICE-SLAM. Quantitative metrics, such as absolute trajectory error (ATE) and Chamfer distance, were utilized to showcase the system's prowess in maintaining high fidelity in the reconstructed scenes. A noteworthy performance is observed in reconstructing thin structures and maintaining map consistency even in loopy trajectories—a challenge for many SLAM systems.

Implications and Future Directions

The resultant mapping capabilities of Vox-Fusion extend to practical implementations in augmented reality, characterized by superior occlusion handling and scene adaptability. The architecture allows seamless integration of virtual objects, supporting dynamic interactions and complex scene edits due to the voxelization's explicit nature.

Vox-Fusion's reliance on voxel-based neural implicit networks reflects a significant shift towards scalable, efficient SLAM systems, emphasizing the potential for further enhancement in large-scale environment mapping. Future research might explore improvements in handling dynamic objects and reducing drift in long-term tracking scenarios. The paper foreshadows an exciting evolution in SLAM methodologies, potentially paving the way for even richer and more interactive AR and VR experiences.

PDF Markdown

Related Papers

GitHub

GitHub - zju3dv/Vox-Fusion: Code for "Dense Tracking and Mapping with Voxel-based Neural Implicit Representation", ISMAR 2022 (268 stars)

YouTube

Show All Videos