Scalable and Efficient Hierarchical Visual Topological Mapping (2404.05023v1)
Abstract: Hierarchical topological representations can significantly reduce search times within mapping and localization algorithms. Although recent research has shown the potential for such approaches, limited consideration has been given to the suitability and comparative performance of different global feature representations within this context. In this work, we evaluate state-of-the-art hand-crafted and learned global descriptors using a hierarchical topological mapping technique on benchmark datasets and present results of a comprehensive evaluation of the impact of the global descriptor used. Although learned descriptors have been incorporated into place recognition methods to improve retrieval accuracy and enhance overall recall, the problem of scalability and efficiency when applied to longer trajectories has not been adequately addressed in a majority of research studies. Based on our empirical analysis of multiple runs, we identify that continuity and distinctiveness are crucial characteristics for an optimal global descriptor that enable efficient and scalable hierarchical mapping, and present a methodology for quantifying and contrasting these characteristics across different global descriptors. Our study demonstrates that the use of global descriptors based on an unsupervised learned Variational Autoencoder (VAE) excels in these characteristics and achieves significantly lower runtime. It runs on a consumer grade desktop, up to 2.3x faster than the second best global descriptor, NetVLAD, and up to 9.5x faster than the hand-crafted descriptor, PHOG, on the longest track evaluated (St Lucia, 17.6 km), without sacrificing overall recall performance.
- J. Sivic and A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in Computer Vision, IEEE International Conference on, vol. 3, pp. 1470–1470, IEEE Computer Society, 2003.
- D. Nister and H. Stewenius, “Scalable recognition with a vocabulary tree,” in 2006 IEEE CVPR’06, vol. 2, pp. 2161–2168, Ieee, 2006.
- H. Korrapati and Y. Mezouar, “Vision-based sparse topological mapping,” Robotics and Autonomous Systems, vol. 62, no. 9, pp. 1259–1270, 2014.
- E. Garcia-Fidalgo and A. Ortiz, “Hierarchical Place Recognition for Topological Mapping,” IEEE Transactions on Robotics, vol. 33, pp. 1061–1074, Oct 2017.
- A. Irschara, C. Zach, J.-M. Frahm, and H. Bischof, “From structure-from-motion point clouds to fast location recognition,” in 2009 IEEE CVPR, pp. 2599–2606, IEEE, 2009.
- S. Middelberg, T. Sattler, O. Untzelmann, and L. Kobbelt, “Scalable 6-dof localization on mobile devices,” in European conference on computer vision, pp. 268–283, Springer, 2014.
- P.-E. Sarlin, F. Debraine, M. Dymczyk, R. Siegwart, and C. Cadena, “Leveraging deep visual descriptors for hierarchical efficient localization,” in Conference on Robot Learning, pp. 456–465, PMLR, 2018.
- P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From Coarse to Fine: Robust Hierarchical Localization at Large Scale,” in 2019 IEEE/CVF CVPR, pp. 12708–12717, 2019.
- M. Cummins and P. Newman, “FAB-MAP: Probabilistic localization and mapping in the space of appearance,” The International Journal of Robotics Research, vol. 27, no. 6, pp. 647–665, 2008.
- M. Cummins and P. Newman, “Appearance-only SLAM at large scale with FAB-MAP 2.0,” The International Journal of Robotics Research, vol. 30, no. 9, pp. 1100–1123, 2011.
- M. J. Milford and G. F. Wyeth, “SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights,” in 2012 IEEE ICRA, pp. 1643–1649, IEEE, 2012.
- D. Gálvez-López and J. D. Tardos, “Bags of binary words for fast place recognition in image sequences,” IEEE Transactions on Robotics, vol. 28, no. 5, pp. 1188–1197, 2012.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” IEEE CVPR, pp. 779–788, 2016.
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” in IEEE TPAMI, vol. 40, pp. 834–848, IEEE, 2018.
- S. Ramachandran and J. McDonald, “Place Recognition in Challenging Conditions,” in IMVIP, August 2019.
- R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “NetVLAD: CNN architecture for weakly supervised place recognition,” in IEEE CVPR 2016, pp. 5297–5307, 2016.
- J. Yu, C. Zhu, J. Zhang, Q. Huang, and D. Tao, “Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition,” IEEE TNNLS, vol. 31, no. 2, pp. 661–674, 2020.
- S. Hausler, S. Garg, M. Xu, M. Milford, and T. Fischer, “Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition,” in IEEE/CVF CVPR, pp. 14136–14147, 2021.
- Y. Cai, J. Zhao, J. Cui, F. Zhang, T. Feng, and C. Ye, “Patch-NetVLAD+: Learned patch descriptor and weighted matching strategy for place recognition,” in 2022 IEEE MFI, pp. 1–8, 2022.
- A. Khaliq, M. Milford, and S. Garg, “MultiRes-NetVLAD: Augmenting Place Recognition Training With Low-Resolution Imagery,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3882–3889, 2022.
- C. Toft, E. Stenborg, L. Hammarstrand, L. Brynte, M. Pollefeys, T. Sattler, and F. Kahl, “Semantic match consistency for long-term visual localization,” in Proceedings of ECCV, pp. 383–399, 2018.
- S. Garg, N. Suenderhauf, and M. Milford, “LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics,” Robotics: Science and Systems XIV, June 2018.
- S. Ramachandran, J. Horgan, G. Sistu, and J. McDonald, “Fast and Efficient Scene Categorization for Autonomous Driving using VAEs,” in IMVIP 2022, pp. 9–16, 2022.
- A. Kumar, P. Sattigeri, and A. Balakrishnan, “Variational inference of disentangled latent concepts from unlabeled observations,” in ICLR, 2018.
- A. Bosch, A. Zisserman, and X. Munoz, “Representing shape with a spatial pyramid kernel,” in ACM CIVR, pp. 401–408, 2007.
- J. Bai, F. Lu, K. Zhang, et al., “ONNX: Open Neural Network Exchange.” https://github.com/onnx/onnx, 2019.
- A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE Conference on CVPR, pp. 3354–3361, IEEE, 2012.
- A. J. Glover, W. P. Maddern, M. J. Milford, and G. F. Wyeth, “FAB-MAP+ RatSLAM: Appearance-based SLAM for multiple times of day,” in 2010 IEEE ICRA, pp. 3507–3512, IEEE, 2010.
- S. Ramachandran and J. McDonald, “OdoViz: A 3D Odometry Visualization and Processing Tool,” in IEEE ITSC, pp. 1391–1398, 2021.
- R. Arroyo, P. F. Alcantarilla, L. M. Bergasa, J. J. Yebes, and S. Bronte, “Fast and effective visual place recognition using binary codes and disparity information,” in IEEE/RSJ IROS, pp. 3089–3094, 2014.
- A. Torii, J. Sivic, T. Pajdla, and M. Okutomi, “Visual place recognition with repetitive structures,” in IEEE CVPR, pp. 883–890, 2013.
- G. Lin, A. Milan, C. Shen, and I. Reid, “Refinenet: Multi-path refinement networks for high-resolution semantic segmentation,” in Proceedings of the IEEE Conference on CVPR, pp. 1925–1934, 2017.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on CVPR, pp. 3213–3223, 2016.