DK-SLAM: Monocular Visual SLAM with Deep Keypoint Learning, Tracking and Loop-Closing (2401.09160v2)
Abstract: The performance of visual SLAM in complex, real-world scenarios is often compromised by unreliable feature extraction and matching when using handcrafted features. Although deep learning-based local features excel at capturing high-level information and perform well on matching benchmarks, they struggle with generalization in continuous motion scenes, adversely affecting loop detection accuracy. Our system employs a Model-Agnostic Meta-Learning (MAML) strategy to optimize the training of keypoint extraction networks, enhancing their adaptability to diverse environments. Additionally, we introduce a coarse-to-fine feature tracking mechanism for learned keypoints. It begins with a direct method to approximate the relative pose between consecutive frames, followed by a feature matching method for refined pose estimation. To mitigate cumulative positioning errors, DK-SLAM incorporates a novel online learning module that utilizes binary features for loop closure detection. This module dynamically identifies loop nodes within a sequence, ensuring accurate and efficient localization. Experimental evaluations on publicly available datasets demonstrate that DK-SLAM outperforms leading traditional and learning based SLAM systems, such as ORB-SLAM3 and LIFT-SLAM. These results underscore the efficacy and robustness of our DK-SLAM in varied and challenging real-world environments.
- J. J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, pp. 611–625, 2016.
- X. Gao, R. Wang, N. Demmel, and D. Cremers, “Ldso: Direct sparse odometry with loop closure,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 2198–2204.
- C. Forster, M. Pizzoli, and D. Scaramuzza, “Svo: Fast semi-direct monocular visual odometry,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 15–22.
- R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.
- C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós, “Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
- Z. Teed and J. Deng, “Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras,” in Neural Information Processing Systems, 2021.
- J. Tang, L. Ericson, J. Folkesson, and P. Jensfelt, “Gcnv2: Efficient correspondence prediction for real-time slam,” IEEE Robotics and Automation Letters, vol. 4, pp. 3505–3512, 2019.
- H. M. S. Bruno and E. Colombini, “Lift-slam: a deep-learning feature-based monocular visual slam method,” Neurocomputing, vol. 455, pp. 97–110, 2020.
- Y. S. Huaiyang Huang, Haoyang Ye and M. Liu, “Monocular visual odometry using learned repeatability and description,” in IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020.
- Y. Lu and G. Lu, “Superthermal: Matching thermal as visible through thermal feature exploration,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2690–2697, 2021.
- E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International Conference on Computer Vision, 2011, pp. 2564–2571.
- J. Shi and C. Tomasi, “Good features to track,” 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600, 1994.
- T. Qin, P. Li, and S. Shen, “Vins-mono: A robust and versatile monocular visual-inertial state estimator,” IEEE Transactions on Robotics, vol. 34, no. 4, pp. 1004–1020, 2018.
- D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 337–33 712, 2017.
- M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint description and detection of local features,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8084–8093.
- A. Barroso-Laguna and K. Mikolajczyk, “Key.net: Keypoint detection by handcrafted and learned cnn filters revisited,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, 2023.
- Y. Tian, B. Fan, and F. Wu, “L2-net: Deep learning of discriminative patch descriptor in euclidean space,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- J. Tang, J. Folkesson, and P. Jensfelt, “Geometric correspondence network for camera motion estimation,” IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 1010–1017, 2018.
- D. Li, X. Shi, Q. Long, S. Liu, W. Yang, F. Wang, Q. Wei, and F. Qiao, “Dxslam: A robust and efficient visual slam system with deep features,” 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4958–4965, 2020.
- P.-E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- J. Jiang, X. Chen, W. Dai, Z. Gao, and Y. Zhang, “Thermal-inertial slam for the environments with challenging illumination,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 8767–8774, 2022.
- K. M. Yi, E. Trulls, V. Lepetit, and P. V. Fua, “Lift: Learned invariant feature transform,” in European Conference on Computer Vision, 2016.
- D. Galvez-López and J. D. Tardos, “Bags of binary words for fast place recognition in image sequences,” IEEE Transactions on Robotics, vol. 28, no. 5, pp. 1188–1197, 2012.
- M. Calonder, V. Lepetit, C. Strecha, and P. V. Fua, “Brief: Binary robust independent elementary features,” in European Conference on Computer Vision, 2010.
- E. Garcia-Fidalgo and A. Ortiz, “ibow-lcd: An appearance-based loop-closure detection approach using incremental bags of binary words,” IEEE Robotics and Automation Letters, vol. 3, no. 4, 2018.
- R. Arandjelović, P. Gronat, A. Torii, T. Pajdla, and J. Sivic, “Netvlad: Cnn architecture for weakly supervised place recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
- Y.-Y. Jau, R. Zhu, H. Su, and M. Chandraker, “Deep keypoint-based camera pose estimation with geometric constraints,” 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4950–4957, 2020.
- R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard, “G2o: A general framework for graph optimization,” in IEEE International Conference on Robotics and Automation, 2011, pp. 3607–3613.
- Y. Wang, B. Xu, W. Fan, and C. Xiang, “A robust and efficient loop closure detection approach for hybrid ground/aerial vehicles,” Drones, vol. 7, no. 2, 2023.
- J. Bian, W.-Y. Lin, Y. Matsushita, S.-K. Yeung, T.-D. Nguyen, and M.-M. Cheng, “Gms: Grid-based motion statistics for fast, ultra-robust feature correspondence,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2828–2837.
- T.-Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision, 2014.
- A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, pp. 1231 – 1237, 2013.
- M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. Achtelik, and R. Y. Siegwart, “The euroc micro aerial vehicle datasets,” The International Journal of Robotics Research, 2016.
- A. Geiger, J. Ziegler, and C. Stiller, “Stereoscan: Dense 3d reconstruction in real-time,” in IEEE Intelligent Vehicles Symposium, 2011.
- S. Umeyama, “Least-squares estimation of transformation parameters between two point patterns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, pp. 376–380, 1991.
- J. Zubizarreta, I. Aguinaga, and J. M. M. Montiel, “Direct sparse mapping,” IEEE Transactions on Robotics, vol. 36, no. 4, 2020.
- H. Yue, J. Miao, Y. Yu, W. Chen, and C. Wen, “Robust loop closure detection based on bag of superpoints and graph verification,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 3787–3793.
- G. Singh, M. Wu, S. Lam, and D. Van Minh, “Hierarchical loop closure detection for long-term visual slam with semantic-geometric descriptors,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 2909–2916.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.