Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 58 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 12 tok/s Pro
GPT-5 High 17 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 179 tok/s Pro
GPT OSS 120B 463 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

CudaSIFT-SLAM: multiple-map visual SLAM for full procedure mapping in real human endoscopy (2405.16932v1)

Published 27 May 2024 in cs.RO and cs.CV

Abstract: Monocular visual simultaneous localization and mapping (V-SLAM) is nowadays an irreplaceable tool in mobile robotics and augmented reality, where it performs robustly. However, human colonoscopies pose formidable challenges like occlusions, blur, light changes, lack of texture, deformation, water jets or tool interaction, which result in very frequent tracking losses. ORB-SLAM3, the top performing multiple-map V-SLAM, is unable to recover from them by merging sub-maps or relocalizing the camera, due to the poor performance of its place recognition algorithm based on ORB features and DBoW2 bag-of-words. We present CudaSIFT-SLAM, the first V-SLAM system able to process complete human colonoscopies in real-time. To overcome the limitations of ORB-SLAM3, we use SIFT instead of ORB features and replace the DBoW2 direct index with the more computationally demanding brute-force matching, being able to successfully match images separated in time for relocation and map merging. Real-time performance is achieved thanks to CudaSIFT, a GPU implementation for SIFT extraction and brute-force matching. We benchmark our system in the C3VD phantom colon dataset, and in a full real colonoscopy from the Endomapper dataset, demonstrating the capabilities to merge sub-maps and relocate in them, obtaining significantly longer sub-maps. Our system successfully maps in real-time 88 % of the frames in the C3VD dataset. In a real screening colonoscopy, despite the much higher prevalence of occluded and blurred frames, the mapping coverage is 53 % in carefully explored areas and 38 % in the full sequence, a 70 % improvement over ORB-SLAM3.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. C. Campos, R. Elvira, J. J. G. Rodríguez, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM3: An accurate open-source library for visual, visual–inertial, and multimap SLAM,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1874–1890, 2021.
  2. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, pp. 91–110, 2004.
  3. T. L. Bobrow, M. Golhar, R. Vijayan, V. S. Akshintala, J. R. Garcia, and N. J. Durr, “Colonoscopy 3D video dataset with paired depth from 2D-3D registration,” Medical Image Analysis, p. 102956, 2023.
  4. Azagra, P., Sostres, C., Ferrandez, Á., Riazuelo, L., Tomasini, C., Barbed, O. L., Morlana, J., Recasens, D., Batlle, V. M., Gómez-Rodríguez, J. J., Elvira, R., López, J., Oriol, C., Civera, J., Tardós, J. D., Murillo, A. C., Lanas, A., and Montiel, J. M. M., “Endomapper dataset of complete calibrated endoscopy procedures,” Scientific Data, vol. 10, no. 1, p. 671, 2023.
  5. Y. Wang, L. Zhao, L. Gong, X. Chen, and S. Zuo, “A monocular SLAM system based on SIFT features for gastroscope tracking,” Medical & Biological Engineering & Computing, vol. 61, no. 2, pp. 511–523, 2023.
  6. N. Mahmoud, T. Collins, A. Hostettler, L. Soler, C. Doignon, and J. M. M. Montiel, “Live tracking and dense reconstruction for handheld monocular endoscopy,” IEEE transactions on medical imaging, vol. 38, no. 1, pp. 79–89, 2018.
  7. P. Mountney, D. Stoyanov, and G.-Z. Yang, “Three-dimensional tissue deformation recovery and tracking,” IEEE Signal Processing Magazine, vol. 27, no. 4, pp. 14–24, 2010.
  8. X. Liu, Z. Li, M. Ishii, G. D. Hager, R. H. Taylor, and M. Unberath, “SAGE: SLAM with appearance and geometry prior for endoscopy,” in 2022 International conference on robotics and automation (ICRA), pp. 5587–5593, IEEE, 2022.
  9. J. J. Gomez-Rodriguez, J. M. M. Montiel, and J. D. Tardos, “NR-SLAM: Non-rigid monocular SLAM,” arXiv preprint arXiv:2308.04036, 2023.
  10. R. Ma, R. Wang, Y. Zhang, S. Pizer, S. K. McGill, J. Rosenman, and J.-M. Frahm, “RNNSLAM: Reconstructing the 3D colon to visualize missing regions during a colonoscopy,” Medical image analysis, vol. 72, p. 102100, 2021.
  11. J. Totz, P. Mountney, D. Stoyanov, and G.-Z. Yang, “Dense surface reconstruction for enhanced navigation in mis,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2011, pp. 89–96, Springer, 2011.
  12. J. Song, J. Wang, L. Zhao, S. Huang, and G. Dissanayake, “MIS-SLAM: Real-time large-scale dense deformable SLAM system in minimal invasive surgery based on heterogeneous computing,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4068–4075, 2018.
  13. J. Song, R. Zhang, Q. Zhu, J. Lin, and M. Ghaffari, “BDIS-SLAM: a lightweight CPU-based dense stereo SLAM for surgery,” International Journal of Computer Assisted Radiology and Surgery, pp. 1–10, 2024.
  14. H. Zhou and J. Jayender, “EMDQ-SLAM: Real-time high-resolution reconstruction of soft tissue surface from stereo laparoscopy videos,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2021, pp. 331–340, Springer, 2021.
  15. D. J. Mirota, H. Wang, R. H. Taylor, M. Ishii, G. L. Gallia, and G. D. Hager, “A system for video-based navigation for endoscopic endonasal skull base surgery,” IEEE transactions on medical imaging, vol. 31, no. 4, pp. 963–976, 2011.
  16. R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós, “ORB-SLAM: a versatile and accurate monocular SLAM system,” IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
  17. J. Shi and C. Tomasi, “Good features to track,” in IEEE conference on computer vision and pattern recognition, pp. 593–600, IEEE, 1994.
  18. B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in IJCAI’81: 7th international joint conference on Artificial intelligence, vol. 2, pp. 674–679, 1981.
  19. J. Engel, V. Koltun, and D. Cremers, “Direct sparse odometry,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 3, pp. 611–625, 2018.
  20. B. K. P. Horn, “Closed-form solution of absolute orientation using unit quaternions,” JOSA A, vol. 4, no. 4, pp. 629–642, 1987.
  21. J. Kannala and S. S. Brandt, “A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1335–1340, 2006.
  22. H. C. Longuet-Higgins, “A computer algorithm for reconstructing a scene from two projections,” Nature, vol. 293, no. 5828, pp. 133–135, 1981.
  23. J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube