CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools (2312.07352v2)
Abstract: Tool tracking in surgical videos is essential for advancing computer-assisted interventions, such as skill assessment, safety zone estimation, and human-machine collaboration. However, the lack of context-rich datasets limits AI applications in this field. Existing datasets rely on overly generic tracking formalizations that fail to capture surgical-specific dynamics, such as tools moving out of the camera's view or exiting the body. This results in less clinically relevant trajectories and a lack of flexibility for real-world surgical applications. Methods trained on these datasets often struggle with visual challenges such as smoke, reflection, and bleeding, further exposing the limitations of current approaches. We introduce CholecTrack20, a specialized dataset for multi-class, multi-tool tracking in surgical procedures. It redefines tracking formalization with three perspectives: (i) intraoperative, (ii) intracorporeal, and (iii) visibility, enabling adaptable and clinically meaningful tool trajectories. The dataset comprises 20 full-length surgical videos, annotated at 1 fps, yielding over 35K frames and 65K labeled tool instances. Annotations include spatial location, category, identity, operator, phase, and scene visual challenge. Benchmarking state-of-the-art methods on CholecTrack20 reveals significant performance gaps, with current approaches (< 45\% HOTA) failing to meet the accuracy required for clinical translation. These findings motivate the need for advanced and intuitive tracking algorithms and establish CholecTrack20 as a foundation for developing robust AI-driven surgical assistance systems.
- Maier-Hein, L. et al. Surgical data science for next-generation interventions. \JournalTitleNature Biomedical Engineering 1, 691–696 (2017).
- Twinanda, A. P. et al. Endonet: a deep architecture for recognition tasks on laparoscopic videos. \JournalTitleIEEE transactions on medical imaging 36, 86–97 (2016).
- Czempiel, T. et al. Tecno: Surgical phase recognition with multi-stage temporal convolutional networks. In Medical Image Computing and Computer Assisted Intervention MICCAI, vol. 12263 of Lecture Notes in Computer Science, 343–352 (Springer, 2020).
- Ramesh, S. et al. Multi-task temporal convolutional networks for joint recognition of surgical phases and steps in gastric bypass procedures. \JournalTitleInt. J. Comput. Assist. Radiol. Surg. 1 – 9 (2021).
- Burgert, O. et al. Linking top-level ontologies and surgical workflows. \JournalTitleInternational Journal of Computer Assisted Radiology and Surgery 1, 437 (2006).
- Nwoye, C. I. et al. Recognition of instrument-tissue interactions in endoscopic videos via action triplets. In International Conference on Medical Image Computing and Computer-Assisted Intervention MICCAI, 364–374 (Springer, 2020).
- Wagner, M. et al. Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the heichole benchmark. \JournalTitleMedical Image Analysis 86, 102770 (2023).
- Xu, L. et al. Information loss challenges in surgical navigation systems: From information fusion to ai-based approaches. \JournalTitleInformation Fusion (2022).
- Fathollahi, M. et al. Video-based surgical skills assessment using long term tool tracking. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part VII, 541–550 (Springer, 2022).
- Technical skill assessment in minimally invasive surgery using artificial intelligence: a systematic review. \JournalTitleSurgical endoscopy 1–13 (2023).
- Lavanchy, J. L. et al. Automation of surgical skill assessment using a three-stage machine learning algorithm. \JournalTitleScientific reports 11, 5197 (2021).
- Richa, R. et al. Visual tracking of surgical tools for proximity detection in retinal surgery. In Information Processing in Computer-Assisted Interventions: Second International Conference, IPCAI 2011, Berlin, Germany, June 22, 2011. Proceedings 2, 55–66 (Springer, 2011).
- Surgical tool tracking based on two cnns: from coarse to fine. \JournalTitleThe Journal of Engineering 2019, 467–472 (2019).
- Weakly supervised convolutional lstm approach for tool tracking in laparoscopic videos. \JournalTitleInternational journal of computer assisted radiology and surgery 14, 1059–1067 (2019).
- Nwoye, C. I. Deep learning methods for the detection and recognition of surgical tools and activities in laparoscopic videos. Ph.D. thesis, Université de Strasbourg (2021).
- Towards real-time multiple surgical tool tracking. \JournalTitleComputer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization 9, 279–285 (2021).
- Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, 3354–3361 (IEEE, 2012).
- Dendorfer, P. et al. Mot20: A benchmark for multi object tracking in crowded scenes. \JournalTitlearXiv preprint arXiv:2003.09003 (2020).
- Sun, P. et al. Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20993–21002 (2022).
- Tao: A large-scale benchmark for tracking any object. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, 436–454 (Springer, 2020).
- Zhu, P. et al. Detection and tracking meet drones challenge. \JournalTitleIEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7380–7399 (2021).
- Real-time surgical instrument tracking in robot-assisted surgery using multi-domain convolutional neural network. \JournalTitleHealthcare technology letters 6, 159–164 (2019).
- Nwoye, C. I. et al. Cholectriplet2022: Show me a tool and tell me the triplet–an endoscopic vision challenge for surgical action triplet detection. \JournalTitlearXiv preprint arXiv:2302.06294 (2023).
- Sznitman, R. et al. Data-driven visual tracking in retinal microsurgery. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2012: 15th International Conference, Nice, France, October 1-5, 2012, Proceedings, Part II 15, 568–575 (Springer, 2012).
- Vision-based and marker-less surgical tool detection and tracking: a review of the literature. \JournalTitleMedical image analysis 35, 633–654 (2017).
- Weakly supervised segmentation for real-time surgical tool tracking. \JournalTitleHealthcare technology letters 6, 231–236 (2019).
- Real-time tracking of surgical instruments based on spatio-temporal context and deep learning. \JournalTitleComputer Assisted Surgery 24, 20–29 (2019).
- Visual detection and tracking algorithms for minimally invasive surgical instruments: A comprehensive review of the state-of-the-art. \JournalTitleRobotics and Autonomous Systems 149, 103945 (2022).
- Methods and datasets for segmentation of minimally invasive surgical instruments in endoscopic images and videos: A review of the state of the art. \JournalTitlearXiv preprint arXiv:2304.13014 (2023).
- Monitoring tool usage in surgery videos using boosted convolutional and recurrent neural networks. \JournalTitleMedical image analysis 47, 203–218 (2018).
- Improving logistics processes of surgical instruments: case of rfid technology. \JournalTitleBusiness Process Management Journal 23, 448–466 (2017).
- Toor, J. et al. Optimizing the surgical instrument tray to immediately increase efficiency and lower costs in the operating room. \JournalTitleCanadian Journal of Surgery 65, E275 (2022).
- Estimation of incision patterns based on visual tracking of surgical tools in minimally invasive surgery. In ASME International Mechanical Engineering Congress and Exposition, vol. 44267, 75–83 (2010).
- Lee, D. et al. Evaluation of surgical skills during robotic surgery by deep learning-based multiple surgical instrument tracking in training and actual operations. \JournalTitleJournal of clinical medicine 9, 1964 (2020).
- Shbool, M. A. et al. The economy of motion for laparoscopic ball clamping surgery: A feedback educational tool. \JournalTitleMethodsX 10, 102168 (2023).
- A model for predicting the gears score from virtual reality surgical simulator metrics. \JournalTitleSurgical endoscopy 32, 3576–3581 (2018).
- Product of tracking experts for visual tracking of surgical tools. In 2013 IEEE International Conference on Automation Science and Engineering (CASE), 480–485 (IEEE, 2013).
- Jin, A. et al. Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks. In 2018 IEEE winter conference on applications of computer vision (WACV), 691–699 (IEEE, 2018).
- Real-time surgical tool detection in computer-aided surgery based on enhanced feature-fusion convolutional neural network. \JournalTitleJournal of Computational Design and Engineering 9, 1123–1134 (2022).
- Real-time surgical tool detection in minimally invasive surgery based on attention-guided convolutional neural network. \JournalTitleIEEE Access 8, 228853–228862 (2020).
- Rsdnet: Learning to predict remaining surgery duration from laparoscopic videos without manual annotations. \JournalTitleIEEE transactions on medical imaging 38, 1069–1078 (2018).
- Nwoye, C. I. et al. Rendezvous: Attention mechanisms for the recognition of surgical action triplets in endoscopic videos. \JournalTitleMedical Image Analysis 78, 102433 (2022).
- Hong, W.-Y. et al. Cholecseg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on cholec80. \JournalTitlearXiv preprint arXiv:2012.12453 (2020).
- Weakly-supervised learning for tool localization in laparoscopic videos. In MICCAI-LABEL, 169–179 (Springer, 2018).
- Jonathon Luiten, A. H. Trackeval. https://github.com/JonathonLuiten/TrackEval (2020).