Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 49 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 172 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models (2404.00578v1)

Published 31 Mar 2024 in cs.CV

Abstract: Medical image analysis is essential to clinical diagnosis and treatment, which is increasingly supported by multi-modal LLMs (MLLMs). However, previous research has primarily focused on 2D medical images, leaving 3D images under-explored, despite their richer spatial information. This paper aims to advance 3D medical image analysis with MLLMs. To this end, we present a large-scale 3D multi-modal medical dataset, M3D-Data, comprising 120K image-text pairs and 662K instruction-response pairs specifically tailored for various 3D medical tasks, such as image-text retrieval, report generation, visual question answering, positioning, and segmentation. Additionally, we propose M3D-LaMed, a versatile multi-modal LLM for 3D medical image analysis. Furthermore, we introduce a new 3D multi-modal medical benchmark, M3D-Bench, which facilitates automatic evaluation across eight tasks. Through comprehensive evaluation, our method proves to be a robust model for 3D medical image analysis, outperforming existing solutions. All code, data, and models are publicly available at: https://github.com/BAAI-DCAI/M3D.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Quantification of uncertainties in biomedical image quantification challenge 2021. https://qubiq21.grand-challenge.org/. Accessed: 18 Aug 2023.
  2. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
  3. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023.
  4. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005.
  5. Vqa-med: Overview of the medical visual question answering task at imageclef 2019. In Proceedings of CLEF (Conference and Labs of the Evaluation Forum) 2019 Working Notes. 9-12 September 2019, 2019.
  6. The liver tumor segmentation benchmark (lits). Medical Image Analysis, 84:102680, 2023.
  7. Shikra: Unleashing multimodal llm’s referential dialogue magic, 2023.
  8. Palm: Scaling language modeling with pathways, 2022.
  9. The cancer imaging archive (tcia): maintaining and operating a public information repository. Journal of digital imaging, 26:1045–1057, 2013.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  11. An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
  12. Palm-e: An embodied multimodal language model, 2023.
  13. Segvol: Universal and interactive volumetric medical image segmentation, 2023.
  14. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022.
  15. Dense biased networks with deep priori anatomy and hard region adaptation: Semi-supervised learning for fine renal artery segmentation. Medical image analysis, 63:101722, 2020.
  16. Meta grayscale adaptive network for 3d integrated renal structures segmentation. Medical image analysis, 71:102055, 2021.
  17. Comparison and evaluation of methods for liver segmentation from ct datasets. IEEE transactions on medical imaging, 28(8):1251–1265, 2009.
  18. The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge. Medical Image Analysis, page 101821, 2020.
  19. The kits21 challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase ct, 2023.
  20. Lora: Low-rank adaptation of large language models, 2021.
  21. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. arXiv preprint arXiv:2206.08023, 2022.
  22. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019.
  23. Chaos challenge - combined (ct-mr) healthy abdominal organ segmentation. Medical Image Analysis, 69:101950, Apr. 2021.
  24. Comparison of semi-automatic and deep learning based automatic methods for liver segmentation in living liver transplant donors. Diagnostic and Interventional Radiology, 26:11–21, Jan. 2020.
  25. Chaos - combined (ct-mr) healthy abdominal organ segmentation challenge data. Apr. 2019.
  26. Referitgame: Referring to objects in photographs of natural scenes. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 787–798, 2014.
  27. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  28. Lisa: Reasoning segmentation via large language model. arXiv preprint arXiv:2308.00692, 2023.
  29. Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge, volume 5, page 12, 2015.
  30. Llava-med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:2306.00890, 2023.
  31. Llava-med: Training a large language-and-vision assistant for biomedicine in one day. Advances in Neural Information Processing Systems, 36, 2024.
  32. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models, 2023.
  33. A computed tomography vertebral segmentation dataset with anatomical variations and multi-vendor scanner data. Scientific data, 8(1):284, 2021.
  34. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
  35. Pmc-clip: Contrastive language-image pre-training using biomedical documents. arXiv preprint arXiv:2303.07240, 2023.
  36. Visual instruction tuning, 2023.
  37. Referring expression generation and comprehension via attributes. In Proceedings of the IEEE International Conference on Computer Vision, pages 4856–4864, 2017.
  38. A vertebral segmentation dataset with fracture grading. Radiology: Artificial Intelligence, 2(4):e190138, 2020.
  39. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  40. WORD: A large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from ct image. Medical Image Analysis, 82:102642, 2022.
  41. Unleashing the strengths of unlabeled data in pan-cancer abdominal organ quantification: the flare22 challenge. arXiv preprint arXiv:2308.05862, 2023.
  42. Abdomenct-1k: Is abdominal organ segmentation a solved problem? IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6695–6714, 2022.
  43. Generation and comprehension of unambiguous object descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 11–20, 2016.
  44. Med-flamingo: a multimodal medical few-shot learner. In Machine Learning for Health (ML4H), pages 353–367. PMLR, 2023.
  45. Gpt-4 technical report, 2023.
  46. OpenAI. ChatGPT: A generative pre-trained transformer for conversational agents. OpenAI Blog, 11 2019.
  47. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  48. A review of the application of multi-modal deep learning in medicine: Bibliometrics and future directions. International Journal of Computational Intelligence Systems, 16(1):44, 2023.
  49. Han-seg: The head and neck organ-at-risk ct and mr segmentation dataset. Medical physics, 50(3):1917–1927, 2023.
  50. Learning transferable visual models from natural language supervision, 2021.
  51. Ct-org: Ct volumes with multiple organ segmentations [dataset]. The Cancer Imaging Archive, 2019.
  52. Ct organ segmentation using gpu data augmentation, unsupervised labels and iou loss. arXiv preprint arXiv:1811.11226, 2018.
  53. Ct-org, a new dataset for multiple organ segmentation in computed tomography. Scientific Data, 7(1):381, 2020.
  54. Data from pancreas-ct. the cancer imaging archive. IEEE Transactions on Image Processing, 2016.
  55. Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part I 18, pages 556–564. Springer, 2015.
  56. Verse: a vertebrae labelling and segmentation benchmark for multi-detector ct images. Medical image analysis, 73:102166, 2021.
  57. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge. Medical image analysis, 42:1–13, 2017.
  58. Laparoscopic partial nephrectomy with segmental renal artery clamping: technique and clinical outcomes. European urology, 59(5):849–855, 2011.
  59. Precise segmental renal artery clamping under the guidance of dual-source computed tomography angiography during laparoscopic partial nephrectomy. European urology, 62(6):1001–1008, 2012.
  60. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063, 2019.
  61. 3d image reconstruction for comparison of algorithm database. URL: https://www. ircad. fr/research/data-sets/liver-segmentation-3d-ircadb-01, 2010.
  62. Eva-clip: Improved training techniques for clip at scale. arXiv preprint arXiv:2303.15389, 2023.
  63. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
  64. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  65. Towards generalist biomedical ai. NEJM AI, 1(3):AIoa2300138, 2024.
  66. Totalsegmentator: Robust segmentation of 104 anatomic structures in ct images. Radiology: Artificial Intelligence, 5(5), 2023.
  67. Totalsegmentator: Robust segmentation of 104 anatomical structures in ct images 2022. arXiv, 2022.
  68. Towards generalist foundation model for radiology. arXiv preprint arXiv:2308.02463, 2023.
  69. Towards generalist foundation model for radiology by leveraging web-scale 2d & 3d medical data, 2023.
  70. Sigmoid loss for language image pre-training, 2023.
  71. Biomedgpt: A unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal tasks, 2024.
  72. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
  73. Pmc-vqa: Visual instruction tuning for medical visual question answering. arXiv preprint arXiv:2305.10415, 2023.
  74. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
  75. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
Citations (18)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.