Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 56 tok/s
Gemini 2.5 Pro 39 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 155 tok/s Pro
GPT OSS 120B 476 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

On Large Visual Language Models for Medical Imaging Analysis: An Empirical Study (2402.14162v1)

Published 21 Feb 2024 in cs.CV and cs.AI

Abstract: Recently, LLMs have taken the spotlight in natural language processing. Further, integrating LLMs with vision enables the users to explore emergent abilities with multimodal data. Visual LLMs (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks. Consequently, there are enormous applications of large models that could be potentially used in the biomedical imaging field. Along that direction, there is a lack of related work to show the ability of large models to diagnose the diseases. In this work, we study the zero-shot and few-shot robustness of VLMs on the medical imaging analysis tasks. Our comprehensive experiments demonstrate the effectiveness of VLMs in analyzing biomedical images such as brain MRIs, microscopic images of blood cells, and chest X-rays.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. A. T. Sahlol, P. Kollmannsberger, and A. A. Ewees, “Efficient classification of white blood cell leukemia with improved swarm optimization of deep features,” Scientific reports, 2020.
  2. C. Liu and Q. Yin, “Automatic diagnosis of covid-19 using a tailored transformer-like network,” in JPCS, 2021.
  3. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in ICML, 2021.
  4. J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds et al., “Flamingo: a visual language model for few-shot learning,” NeurIPS, 2022.
  5. H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” arXiv preprint arXiv:2304.08485, 2023.
  6. OpenAI, “Gpt-4 technical report,” 2023.
  7. S. Zhang, Y. Xu, N. Usuyama, J. Bagga, R. Tinn, S. Preston, R. Rao, M. Wei, N. Valluri, C. Wong et al., “Large-scale domain-specific pretraining for biomedical vision-language processing,” arXiv preprint arXiv:2303.00915, 2023.
  8. Z. Yan, K. Zhang, R. Zhou, L. He, X. Li, and L. Sun, “Multimodal chatgpt for medical applications: an experimental study of gpt-4v,” arXiv preprint arXiv:2310.19061, 2023.
  9. K. Singhal, T. Tu, J. Gottweis, R. Sayres, E. Wulczyn, L. Hou, K. Clark, S. Pfohl, H. Cole-Lewis, D. Neal et al., “Towards expert-level medical question answering with large language models,” arXiv preprint arXiv:2305.09617, 2023.
  10. A. J. Thirunavukarasu, D. S. J. Ting, K. Elangovan, L. Gutierrez, T. F. Tan, and D. S. W. Ting, “Large language models in medicine,” Nature medicine, 2023.
  11. S. Wang, Z. Zhao, X. Ouyang, Q. Wang, and D. Shen, “Chatcad: Interactive computer-aided diagnosis on medical image using large language models,” arXiv preprint arXiv:2302.07257, 2023.
  12. A. M. G. Allah, A. M. Sarhan, and N. M. Elshennawy, “Edge u-net: Brain tumor segmentation using mri based on deep u-net model with boundary information,” Expert Systems with Applications, 2023.
  13. J. Cheng, “brain tumor dataset,” 4 2017. [Online]. Available: https://figshare.com/articles/dataset/brain_tumor_dataset/1512427
  14. R. D. Labati, V. Piuri, and F. Scotti, “All-idb: The acute lymphoblastic leukemia image database for image processing,” in ICIP, 2011.
  15. L. H. Vogado, R. M. Veras, F. H. Araujo, R. R. Silva, and K. R. Aires, “Leukemia diagnosis in blood slides using transfer learning in cnns and svm for classification,” Engineering Applications of Artificial Intelligence, 2018.
  16. F. Liu, T. Zhu, X. Wu, B. Yang, C. You, C. Wang, L. Lu, Z. Liu, Y. Zheng, X. Sun et al., “A medical multimodal large language model for future pandemics,” NPJ Digital Medicine, vol. 6, no. 1, p. 226, 2023.
  17. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  18. Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in CVPR, 2022.
  19. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  20. H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
  21. J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” NeurIPS, 2022.
  22. OpenAI. GPT-4 Technical Report.
  23. J. P. Cohen, P. Morrison, L. Dao, K. Roth, T. Q. Duong, and M. Ghassemi, “Covid-19 image data collection: Prospective predictions are the future,” arXiv preprint arXiv:2006.11988, 2020.
  24. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural computation, 1989.
  25. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
  26. G. Ilharco, M. Wortsman, R. Wightman, C. Gordon, N. Carlini, R. Taori, A. Dave, V. Shankar, H. Namkoong, J. Miller, H. Hajishirzi, A. Farhadi, and L. Schmidt, “Openclip,” 2021.
  27. A. Awadalla, I. Gao, J. Gardner, J. Hessel, Y. Hanafy, W. Zhu, K. Marathe, Y. Bitton, S. Gadre, S. Sagawa et al., “Openflamingo: An open-source framework for training large autoregressive vision-language models,” arXiv preprint arXiv:2308.01390, 2023.
Citations (13)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.