Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DISORF: A Distributed Online 3D Reconstruction Framework for Mobile Robots (2403.00228v3)

Published 1 Mar 2024 in cs.RO and cs.CV

Abstract: We present a framework, DISORF, to enable online 3D reconstruction and visualization of scenes captured by resource-constrained mobile robots and edge devices. To address the limited computing capabilities of edge devices and potentially limited network availability, we design a framework that efficiently distributes computation between the edge device and the remote server. We leverage on-device SLAM systems to generate posed keyframes and transmit them to remote servers that can perform high-quality 3D reconstruction and visualization at runtime by leveraging recent advances in neural 3D methods. We identify a key challenge with online training where naive image sampling strategies can lead to significant degradation in rendering quality. We propose a novel shifted exponential frame sampling method that addresses this challenge for online training. We demonstrate the effectiveness of our framework in enabling high-quality real-time reconstruction and visualization of unknown scenes as they are captured and streamed from cameras in mobile robots and edge devices.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger, “Real-time 3d reconstruction at scale using voxel hashing,” ACM Transactions on Graphics (TOG), vol. 32, 11 2013.
  2. A. Dai, M. Nießner, M. Zollhöfer, S. Izadi, and C. Theobalt, “Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, p. 1, 2017.
  3. T. Whelan, R. F. Salas-Moreno, B. Glocker, A. J. Davison, and S. Leutenegger, “Elasticfusion: Real-time dense slam and light source estimation,” The International Journal of Robotics Research, vol. 35, no. 14, pp. 1697–1716, 2016.
  4. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in European Conference on Computer Vision.   Springer, 2020, pp. 405–421.
  5. T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (ToG), vol. 41, no. 4, pp. 1–15, 2022.
  6. S. Fridovich-Keil, A. Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa, “Plenoxels: Radiance fields without neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5501–5510.
  7. A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su, “Tensorf: Tensorial radiance fields,” in European Conference on Computer Vision.   Springer, 2022, pp. 333–350.
  8. N. Inc., “Jeston xavier nx series modules,” 2022, accessed: 2022-06-01. [Online]. Available: https://www.nvidia.com/enus/autonomous-machines/embedded-systems/jetson-xavier-nx/
  9. S. Li, C. Li, W. Zhu, B. Yu, Y. Zhao, C. Wan, H. You, H. Shi, and Y. Lin, “Instant-3d: Instant neural radiance field training towards on-device ar/vr 3d reconstruction,” in Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023, pp. 1–13.
  10. J. Yu, J. E. Low, K. Nagami, and M. Schwager, “Nerfbridge: Bringing real-time, online neural radiance field training to robotics,” arXiv preprint arXiv:2305.09761, 2023.
  11. M. Tancik, E. Weber, E. Ng, R. Li, B. Yi, T. Wang, A. Kristoffersen, J. Austin, K. Salahi, A. Ahuja, et al., “Nerfstudio: A modular framework for neural radiance field development,” in ACM SIGGRAPH 2023 Conference Proceedings, 2023, pp. 1–12.
  12. R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE transactions on robotics, vol. 33, no. 5, pp. 1255–1262, 2017.
  13. J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  14. J. L. Schönberger, E. Zheng, M. Pollefeys, and J.-M. Frahm, “Pixelwise view selection for unstructured multi-view stereo,” in European Conference on Computer Vision (ECCV), 2016.
  15. R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and A. Fitzgibbon, “Kinectfusion: Real-time dense surface mapping and tracking,” in 2011 10th IEEE International Symposium on Mixed and Augmented Reality, 2011, pp. 127–136.
  16. M. Nießner, M. Zollhöfer, S. Izadi, and M. Stamminger, “Real-time 3d reconstruction at scale using voxel hashing,” ACM Transactions on Graphics (TOG), 2013.
  17. E. Bylow, J. Sturm, C. Kerl, F. Kahl, and D. Cremers, “Real-time camera tracking and 3d reconstruction using signed distance functions,” 06 2013.
  18. E. Vespa, N. Nikolov, M. Grimm, L. Nardi, P. H. J. Kelly, and S. Leutenegger, “Efficient octree-based volumetric slam supporting signed-distance and occupancy mapping,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 1144–1151, April 2018.
  19. M. Keller, D. Lefloch, M. Lambers, S. Izadi, T. Weyrich, and A. Kolb, “Real-time 3d reconstruction in dynamic scenes using point-based fusion,” in 2013 International Conference on 3D Vision-3DV 2013.   IEEE, 2013, pp. 1–8.
  20. Y.-P. Cao, L. Kobbelt, and S.-M. Hu, “Real-time high-accuracy three-dimensional reconstruction with consumer rgb-d cameras,” ACM Transactions on Graphics (TOG), vol. 37, no. 5, pp. 1–16, 2018.
  21. J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove, “Deepsdf: Learning continuous signed distance functions for shape representation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 165–174.
  22. L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy networks: Learning 3d reconstruction in function space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4460–4470.
  23. E. Sucar, K. Wada, and A. Davison, “Nodeslam: Neural object descriptors for multi-view shape reconstruction,” in 2020 International Conference on 3D Vision (3DV).   IEEE, 2020, pp. 949–958.
  24. J. Huang, S.-S. Huang, H. Song, and S.-M. Hu, “Di-fusion: Online implicit 3d reconstruction with deep priors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8932–8941.
  25. S. Weder, J. L. Schonberger, M. Pollefeys, and M. R. Oswald, “Neuralfusion: Online depth fusion in latent space,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3162–3172.
  26. J. Sun, Y. Xie, L. Chen, X. Zhou, and H. Bao, “Neuralrecon: Real-time coherent 3d reconstruction from monocular video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 598–15 607.
  27. A. Düzçeker, S. Galliani, C. Vogel, P. Speciale, M. Dusmanu, and M. Pollefeys, “Deepvideomvs: Multi-view stereo on video with recurrent spatio-temporal fusion. arxiv 2020,” arXiv preprint arXiv:2012.02177.
  28. Z. Teed and J. Deng, “Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,” Advances in neural information processing systems, vol. 34, pp. 16 558–16 569, 2021.
  29. J. Kerr, L. Fu, H. Huang, Y. Avigal, M. Tancik, J. Ichnowski, A. Kanazawa, and K. Goldberg, “Evo-nerf: Evolving nerf for sequential robot grasping of transparent objects,” in 6th Annual Conference on Robot Learning, 2022.
  30. L. Yen-Chen, P. Florence, J. T. Barron, T.-Y. Lin, A. Rodriguez, and P. Isola, “Nerf-supervision: Learning dense object descriptors from neural radiance fields,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 6496–6503.
  31. A. Byravan, J. Humplik, L. Hasenclever, A. Brussee, F. Nori, T. Haarnoja, B. Moran, S. Bohez, F. Sadeghi, B. Vujatovic, et al., “Nerf2real: Sim2real transfer of vision-guided bipedal motion skills using neural radiance fields,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 9362–9369.
  32. L. Yen-Chen, P. Florence, J. T. Barron, A. Rodriguez, P. Isola, and T.-Y. Lin, “inerf: Inverting neural radiance fields for pose estimation,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 1323–1330.
  33. M. Adamkiewicz, T. Chen, A. Caccavale, R. Gardner, P. Culbertson, J. Bohg, and M. Schwager, “Vision-only robot navigation in a neural radiance world,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4606–4613, 2022.
  34. R. M. French, “Catastrophic forgetting in connectionist networks,” Trends in cognitive sciences, vol. 3, no. 4, pp. 128–135, 1999.
  35. A. Rosenfeld and J. K. Tsotsos, “Incremental learning through deep adaptation,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 3, pp. 651–663, 2018.
  36. M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 7, pp. 3366–3385, 2021.
  37. S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 2001–2010.
  38. D. Rolnick, A. Ahuja, J. Schwarz, T. Lillicrap, and G. Wayne, “Experience replay for continual learning,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  39. E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, “imap: Implicit mapping and positioning in real-time,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6229–6238.
  40. Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 786–12 796.
  41. H. Wang, J. Wang, and L. Agapito, “Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 293–13 302.
  42. M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, A. Y. Ng, et al., “Ros: an open-source robot operating system,” in ICRA workshop on open source software, vol. 3, no. 3.2.   Kobe, Japan, 2009, p. 5.
  43. J. Chung, K. Lee, S. Baik, and K. M. Lee, “Meil-nerf: Memory-efficient incremental learning of neural radiance fields,” arXiv preprint arXiv:2212.08328, 2022.
  44. Z. Wang, S. Wu, W. Xie, M. Chen, and V. A. Prisacariu, “Nerf–: Neural radiance fields without known camera parameters,” arXiv preprint arXiv:2102.07064, 2021.
  45. J. Straub, T. Whelan, L. Ma, Y. Chen, E. Wijmans, S. Green, J. J. Engel, R. Mur-Artal, C. Ren, S. Verma, et al., “The replica dataset: A digital replica of indoor spaces,” arXiv preprint arXiv:1906.05797, 2019.
  46. J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems,” in Proc. of the International Conference on Intelligent Robot Systems (IROS), Oct. 2012.
  47. B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM Transactions on Graphics (ToG), vol. 42, no. 4, pp. 1–14, 2023.

Summary

  • The paper introduces a distributed computation model that splits pose estimation on edge devices and NeRF training on remote servers.
  • It proposes a shifted exponential frame sampling strategy that significantly improves rendering quality in online NeRF training.
  • The integration with SLAM systems ensures efficient keyframe generation and high-quality 3D scene visualization.

DISORF: A Novel Framework for Online NeRF Training and Visualization on Mobile Robots

Introduction

The domain of online 3D reconstruction and visualization, especially when leveraging Neural Radiance Fields (NeRFs) for dynamic environments, stands as a cornerstone for a myriad of applications in robotics and augmented reality. The recent work titled "DISORF: A Distributed Online NeRF Training and Rendering Framework for Mobile Robots" by Chunlin Li et al., addresses critical challenges associated with deploying NeRF on resource-constrained mobile robots and edge devices, such as drones. This research introduces a novel distributed framework, DISORF, aimed at enabling real-time, high-quality 3D scene reconstruction and visualization by efficiently managing computational workloads between edge devices and remote servers.

Key Contributions

  • Distributed Computation between Edge and Server: DISORF proposes a framework that tactfully divides the computational tasks primarily into pose estimation performed on the edge device and the computationally intensive NeRF training on a remote server. This distribution is crucial for circumventing the limitations posed by the insufficient computational power on edge devices.
  • Shifted Exponential Frame Sampling Method: A significant insight from this work is the identification of a challenge with naive image sampling strategies in online NeRF training, leading to compromised rendering quality. The authors introduce a novel sampling strategy, termed shifted exponential frame sampling, designed to overcome this issue by dynamically adjusting the emphasis on more recent frames during training iterations.
  • Integration with SLAM Systems: The framework adeptly leverages on-device Simultaneous Localization and Mapping (SLAM) systems to generate posed keyframes, which are then transmitted to remote servers for NeRF training and rendering. This integration not only enhances the robustness of pose estimation but also facilitates efficient data transmission over potentially limited network bandwidth.

Findings and Implications

Through comprehensive experiments on different scenes from the Replica and Tanks and Temples datasets, the DISORF framework demonstrated superior performance in enabling high-quality real-time scene reconstruction and visualization. The shifted exponential frame sampling method, in particular, showed a marked improvement over traditional uniform sampling approaches and other incremental learning strategies like that employed by iMAP. This implies a promising direction for optimizing online NeRF training mechanisms to yield better rendering quality. Moreover, when applied to different 3D representation methods like 3D Gaussian Splatting (3DGS), the proposed sampling strategy still enhanced the rendering quality, showcasing its versatility and effectiveness.

Looking Ahead

The successful deployment of DISORF opens multiple avenues for future exploration and development in real-time 3D reconstruction and visualization. Potential applications extend across autonomous navigation, remote surveillance, dynamic scene understanding, and augmented reality, especially in resource-constrained environments. Further research could explore optimizing the network protocols for more efficient data transmission, extending the framework to support a wider variety of edge devices and scaling the distributed computation model to leverage cloud computing resources for larger scale deployments.

Furthermore, integrating the framework with advanced SLAM algorithms could further refine the pose estimation and keyframe generation, potentially enabling even more detailed and accurate 3D reconstructions. Finally, examining the adaptability of the shifted exponential frame sampling method across other implicit neural representation models could yield insights into more universally applicable techniques for enhancing online NeRF training regimes.

Conclusion

DISORF represents a significant step forward in the domain of online 3D reconstruction and visualization, particularly for mobile robotics applications. By addressing the computational challenges and proposing an innovative sampling strategy, this research not only advances our capabilities in real-time 3D scene rendering but also sets the stage for future innovations that could further revolutionize our interaction with and understanding of dynamic environments.