Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis (2310.20381v5)

Published 31 Oct 2023 in cs.CV and cs.AI

Abstract: This work conducts an evaluation of GPT-4V's multimodal capability for medical image analysis, with a focus on three representative tasks of radiology report generation, medical visual question answering, and medical visual grounding. For the evaluation, a set of prompts is designed for each task to induce the corresponding capability of GPT-4V to produce sufficiently good outputs. Three evaluation ways including quantitative analysis, human evaluation, and case study are employed to achieve an in-depth and extensive evaluation. Our evaluation shows that GPT-4V excels in understanding medical images and is able to generate high-quality radiology reports and effectively answer questions about medical images. Meanwhile, it is found that its performance for medical visual grounding needs to be substantially improved. In addition, we observe the discrepancy between the evaluation outcome from quantitative analysis and that from human evaluation. This discrepancy suggests the limitations of conventional metrics in assessing the performance of LLMs like GPT-4V and the necessity of developing new metrics for automatic quantitative analysis.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yingshu Li (11 papers)
  2. Yunyi Liu (10 papers)
  3. Zhanyu Wang (22 papers)
  4. Xinyu Liang (11 papers)
  5. Lingqiao Liu (114 papers)
  6. Lei Wang (977 papers)
  7. Leyang Cui (50 papers)
  8. Zhaopeng Tu (135 papers)
  9. Longyue Wang (87 papers)
  10. Luping Zhou (72 papers)
Citations (34)

Summary

We haven't generated a summary for this paper yet.