Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 27 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 70 tok/s Pro
Kimi K2 117 tok/s Pro
GPT OSS 120B 459 tok/s Pro
Claude Sonnet 4 34 tok/s Pro
2000 character limit reached

General surgery vision transformer: A video pre-trained foundation model for general surgery (2403.05949v3)

Published 9 Mar 2024 in cs.CV, cs.LG, and q-bio.TO

Abstract: The absence of openly accessible data and specialized foundation models is a major barrier for computational research in surgery. Toward this, (i) we open-source the largest dataset of general surgery videos to-date, consisting of 680 hours of surgical videos, including data from robotic and laparoscopic techniques across 28 procedures; (ii) we propose a technique for video pre-training a general surgery vision transformer (GSViT) on surgical videos based on forward video prediction that can run in real-time for surgical applications, toward which we open-source the code and weights of GSViT; (iii) we also release code and weights for procedure-specific fine-tuned versions of GSViT across 10 procedures; (iv) we demonstrate the performance of GSViT on the Cholec80 phase annotation task, displaying improved performance over state-of-the-art single frame predictors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. “Data-driven visual tracking in retinal microsurgery” In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2012: 15th International Conference, Nice, France, October 1-5, 2012, Proceedings, Part II 15, 2012, pp. 568–575 Springer
  2. Djork-Arné Clevert, Thomas Unterthiner and Sepp Hochreiter “Fast and accurate deep network learning by exponential linear units (elus)” In arXiv preprint arXiv:1511.07289, 2015
  3. Jimmy Lei Ba, Jamie Ryan Kiros and Geoffrey E Hinton “Layer normalization” In arXiv preprint arXiv:1607.06450, 2016
  4. “Endonet: a deep architecture for recognition tasks on laparoscopic videos” In IEEE transactions on medical imaging 36.1 IEEE, 2016, pp. 86–97
  5. “CATARACTS: Challenge on automatic tool annotation for cataRACT surgery” In Medical image analysis 52 Elsevier, 2019, pp. 24–41
  6. “Efficientnet: Rethinking model scaling for convolutional neural networks” In International conference on machine learning, 2019, pp. 6105–6114 PMLR
  7. “2018 robotic scene segmentation challenge” In arXiv preprint arXiv:2001.11190, 2020
  8. “Tecno: Surgical phase recognition with multi-stage temporal convolutional networks” In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part III 23, 2020, pp. 343–352 Springer
  9. “Cholecseg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on cholec80” In arXiv preprint arXiv:2012.12453, 2020
  10. “Multi-task recurrent convolutional network with correlation loss for surgical video analysis” In Medical image analysis 59 Elsevier, 2020, pp. 101572
  11. “Glit: Neural architecture search for global and local image transformer” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12–21
  12. “Transunet: Transformers make strong encoders for medical image segmentation” In arXiv preprint arXiv:2102.04306, 2021
  13. “Robotic inguinal hernia repair: systematic review and meta-analysis” In ANZ Journal of Surgery 91.11 Wiley Online Library, 2021, pp. 2277–2287
  14. “Large language models are few-shot clinical information extractors” In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 1998–2022
  15. “Simvp: Simpler yet better video prediction” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3170–3180
  16. “Maskvit: Masked visual pre-training for video prediction” In arXiv preprint arXiv:2206.11894, 2022
  17. Eun Jeong Jang, Kwan Woo Kim and Sung Hwa Kang “Early Experience of Pure Robotic Right Hepatectomy for Liver Donors in a Small-Volume Center” In JSLS: Journal of the Society of Laparoscopic & Robotic Surgeons 26.4 Society of Laparoscopic & Robotic Surgeons, 2022
  18. “Whether and When does Endoscopy Domain Pretraining Make Sense?” In arXiv preprint arXiv:2303.17636, 2023
  19. “A visual–language foundation model for pathology image analysis using medical twitter” In Nature medicine 29.9 Nature Publishing Group US New York, 2023, pp. 2307–2316
  20. “Llava-med: Training a large language-and-vision assistant for biomedicine in one day” In arXiv preprint arXiv:2306.00890, 2023
  21. “EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14420–14430
  22. “Lovit: Long video transformer for surgical phase recognition” In arXiv preprint arXiv:2305.08989, 2023
  23. “Can generalist foundation models outcompete special-purpose tuning? case study in medicine” In arXiv preprint arXiv:2311.16452, 2023
  24. “Towards expert-level medical question answering with large language models” In arXiv preprint arXiv:2305.09617, 2023
  25. “A foundation model for generalizable disease detection from retinal images” In Nature 622.7981 Nature Publishing Group UK London, 2023, pp. 156–163
  26. “Surgical tool classification and localization: results and methods from the MICCAI 2022 SurgToolLoc challenge” In arXiv preprint arXiv:2305.07152, 2023
  27. “Language models are susceptible to incorrect patient self-diagnosis in medical applications” In arXiv preprint arXiv:2309.09362, 2023
  28. “Segment anything in medical images” In Nature Communications 15.1 Nature Publishing Group UK London, 2024, pp. 654
  29. “ViT-AE++: improving vision transformer autoencoder for self-supervised medical image representations” In Medical Imaging with Deep Learning, 2024, pp. 666–679 PMLR
  30. “Addressing cognitive bias in medical language models” In arXiv preprint arXiv:2402.08113, 2024
  31. “General-purpose foundation models for increased autonomy in robot-assisted surgery” In arXiv preprint arXiv:2401.00678, 2024
Citations (4)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.