Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner (2006.16670v3)

Published 30 Jun 2020 in cs.CV

Abstract: Deep learning techniques hold promise to develop dense topography reconstruction and pose estimation methods for endoscopic videos. However, currently available datasets do not support effective quantitative benchmarking. In this paper, we introduce a comprehensive endoscopic SLAM dataset consisting of 3D point cloud data for six porcine organs, capsule and standard endoscopy recordings as well as synthetically generated data. A Panda robotic arm, two commercially available capsule endoscopes, two conventional endoscopes with different camera properties, and two high precision 3D scanners were employed to collect data from 8 ex-vivo porcine gastrointestinal (GI)-tract organs. In total, 35 sub-datasets are provided with 6D pose ground truth for the ex-vivo part: 18 sub-dataset for colon, 12 sub-datasets for stomach and 5 sub-datasets for small intestine, while four of these contain polyp-mimicking elevations carried out by an expert gastroenterologist. Synthetic capsule endoscopy frames from GI-tract with both depth and pose annotations are included to facilitate the study of simulation-to-real transfer learning algorithms. Additionally, we propound Endo-SfMLearner, an unsupervised monocular depth and pose estimation method that combines residual networks with spatial attention module in order to dictate the network to focus on distinguishable and highly textured tissue regions. The proposed approach makes use of a brightness-aware photometric loss to improve the robustness under fast frame-to-frame illumination changes. To exemplify the use-case of the EndoSLAM dataset, the performance of Endo-SfMLearner is extensively compared with the state-of-the-art. The codes and the link for the dataset are publicly available at https://github.com/CapsuleEndoscope/EndoSLAM. A video demonstrating the experimental setup and procedure is accessible through https://www.youtube.com/watch?v=G_LCe0aWWdQ.

Citations (3)

Summary

  • The paper presents a novel dataset and method for unsupervised monocular depth and pose estimation in endoscopic videos.
  • It employs a spatial attention module with a brightness-aware photometric loss to enhance performance under variable illumination.
  • Numerical evaluations show significant reductions in trajectory and pose estimation errors versus state-of-the-art approaches.

Overview of the EndoSLAM Dataset and Endo-SfMLearner Method

The paper presents an innovative contribution to the field of medical image analysis, particularly focusing on the application of deep learning techniques to endoscopic videos for dense topography reconstruction and pose estimation. This research addresses a significant gap in the availability of comprehensive datasets for benchmarking such methodologies, introducing the EndoSLAM dataset alongside a novel unsupervised monocular visual odometry and depth estimation approach termed Endo-SfMLearner.

The EndoSLAM dataset is meticulously curated to encompass a broad spectrum of data types and conditions necessary for developing and evaluating SLAM (Simultaneous Localization and Mapping) algorithms for endoscopic applications. The dataset includes extensive 3D point cloud data across six porcine organs and recordings from both capsule and standard endoscopy procedures, augmented by synthetically generated data. With the endeavor to capture diverse recording scenarios, the authors employed a sophisticated setup integrating a robotic arm, multiple endoscopes with varying intrinsic properties, and high-precision 3D scanners. Each of the 35 sub-datasets contains detailed 6D pose ground truth, imperative for assessing SLAM methodologies in a clinical context reliant on accurate localization.

Alongside the dataset, the Endo-SfMLearner method proposes a robust framework for unsupervised monocular depth and pose estimation tailored to the endoscopic domain. This approach leverages a spatial attention module integrated with residual networks, aiming to direct the network's focus toward highly textured and distinguishable tissue areas, which are crucial in medical diagnostics. Additionally, the incorporation of a brightness-aware photometric loss function mitigates the challenges posed by variable illumination conditions common in endoscopic captures, enhancing the performance robustness under these circumstances.

Numerical evaluations demonstrate the supremacy of Endo-SfMLearner over existing state-of-the-art approaches, such as SC-SfMLearner, SfMLearner, and Monodepth2, in both real and synthetic environments. Notably, the method exhibited improved absolute trajectory and relative pose estimation, significantly reducing errors in challenging scenarios.

The implications of this work extend beyond immediate contributions to SLAM in endoscopy. The comprehensive dataset and advanced methodological framework open avenues for further research into unsupervised learning techniques and their applications within medical settings. Future work could explore enhancements in adaptability across diverse organ types and incorporate other imaging modalities to expand the dataset's applicability. Moreover, integrating Endo-SfMLearner with tasks like segmentation and anomaly detection could foster advancements in multi-task and meta-learning, further solidifying its utility in clinical diagnostics.

In conclusion, the EndoSLAM dataset and Endo-SfMLearner method represent pivotal steps forward in enhancing the efficacy and precision of endoscopic procedures, with substantial potential to influence future technological and clinical applications in gastrointestinal healthcare.