Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography (2403.17834v3)
Abstract: While computer vision has achieved tremendous success with multimodal encoding and direct textual interaction with images via chat-based LLMs, similar advancements in medical imaging AI, particularly in 3D imaging, have been limited due to the scarcity of comprehensive datasets. To address this critical gap, we introduce CT-RATE, the first dataset that pairs 3D medical images with corresponding textual reports. CT-RATE comprises 25,692 non-contrast 3D chest CT scans from 21,304 unique patients. Through various reconstructions, these scans are expanded to 50,188 volumes, totaling over 14.3 million 2D slices. Each scan is accompanied by its corresponding radiology report. Leveraging CT-RATE, we develop CT-CLIP, a CT-focused contrastive language-image pretraining framework designed for broad applications without the need for task-specific training. We demonstrate how CT-CLIP can be used in two tasks: multi-abnormality detection and case retrieval. Remarkably, in multi-abnormality detection, CT-CLIP outperforms state-of-the-art fully supervised models across all key metrics, effectively eliminating the need for manual annotation. In case retrieval, it efficiently retrieves relevant cases using either image or textual queries, thereby enhancing knowledge dissemination. By combining CT-CLIP's vision encoder with a pretrained LLM, we create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes. Finetuned on over 2.7 million question-answer pairs derived from the CT-RATE dataset, CT-CHAT surpasses other multimodal AI assistants, underscoring the necessity for specialized methods in 3D medical imaging. Collectively, the open-source release of CT-RATE, CT-CLIP, and CT-CHAT not only addresses critical challenges in 3D medical imaging, but also lays the groundwork for future innovations in medical AI and improved patient care.
- A. Esteva, K. Chou, S. Yeung, N. V. Naik, A. Madani, A. Mottaghi, Y. Liu, E. J. Topol, J. Dean, and R. Socher, “Deep learning-enabled medical computer vision,” NPJ Digital Medicine, vol. 4, 2021.
- C. Qin, D. Yao, Y. Shi, and Z. Song, “Computer-aided detection in chest radiography based on artificial intelligence: a survey,” BioMedical Engineering OnLine, vol. 17, 2018.
- J. Lipková, T. Y. Chen, M. Y. Lu, R. J. Chen, M. Shady, M. Williams, J. Wang, Z. Noor, R. N. Mitchell, M. Turan, G. Coskun, F. Yilmaz, D. Demir, D. Nart, K. Başak, N. Turhan, S. Ozkara, Y. Banz, K. E. Odening, and F. Mahmood, “Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies,” Nature Medicine, vol. 28, pp. 575 – 582, 2022.
- I. E. Hamamci, S. Er, E. Simsar, A. K. Sekuboyina, M. Gundogar, B. Stadlinger, A. C. Mehl, and B. H. Menze, “Diffusion-based hierarchical multi-label object detection to analyze panoramic dental x-rays,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2023.
- S. Pati, S. P. Thakur, M. M. Bhalerao, U. Baid, C. M. Grenko, B. Edwards, M. J. Sheller, J. L. Agraz, B. Baheti, V. Bashyam, P. Sharma, B. Haghighi, A. Gastounioti, M. Bergman, B. H. Menze, D. Kontos, C. Davatzikos, and S. Bakas, “Gandlf: the generally nuanced deep learning framework for scalable end-to-end clinical workflows,” Communications Engineering, vol. 2, no. 1, p. 23, 2023.
- A. E. W. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C. ying Deng, R. G. Mark, and S. Horng, “Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports,” Scientific Data, vol. 6, 2019.
- X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3462–3471, 2017.
- H. Q. Nguyen, K. Lam, L. T. Le, H. Pham, D. Q. Tran, D. B. Nguyen, D. D. Le, C. M. Pham, H. Tong, D. H. Dinh, C. D. Do, L. T. Doan, C. N. Nguyen, B. T. Nguyen, Q. V. Nguyen, A. D. Hoang, H. N. Phan, A. T. Nguyen, P. Ho, D. T. Ngo, N. T. Nguyen, N. T. Nguyen, M.-S. Dao, and V. Vu, “Vindr-cxr: An open dataset of chest x-rays with radiologist’s annotations,” Scientific Data, vol. 9, 2020.
- J. A. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. L. Ball, K. S. Shpanskaya, J. Seekins, D. A. Mong, S. S. Halabi, J. K. Sandberg, R. Jones, D. B. Larson, C. Langlotz, B. N. Patel, M. P. Lungren, and A. Ng, “Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison,” in AAAI Conference on Artificial Intelligence, 2019.
- I. E. Hamamci, S. Er, E. Simsar, A. E. Yuksel, S. Gultekin, S. Ozdemir, K. Yang, H. Li, S. Pati, B. Stadlinger, A. C. Mehl, M. Gundogar, and B. H. Menze, “Dentex: An abnormal tooth detection with dental enumeration and diagnosis benchmark for panoramic x-rays,” ArXiv, vol. abs/2305.19112, 2023.
- E. Tiu, E. Talius, P. Patel, C. P. Langlotz, A. Y. Ng, and P. Rajpurkar, “Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning,” Nature Biomedical Engineering, vol. 6, pp. 1399 – 1406, 2022.
- M. Y. Lu, B. Chen, D. F. K. Williamson, R. J. Chen, I. Liang, T. Ding, G. Jaume, I. Odintsov, L. P. Le, G. Gerber, A. V. Parwani, A. Zhang, and F. Mahmood, “A visual-language foundation model for computational pathology.,” Nature Medicine, 2024.
- X. Chen, X. Wang, K. Zhang, R. Zhang, K.-M. Fung, T. C. Thai, K. M. Moore, R. S. Mannel, H. Liu, B. Zheng, and Y. Qiu, “Recent advances and clinical applications of deep learning in medical image analysis,” Medical Image Analysis, vol. 79, p. 102444, 2021.
- S. P. Singh, L. Wang, S. Gupta, H. Goli, P. Padmanabhan, and B. Guly’as, “3d deep learning on medical images: a review,” Sensors (Basel, Switzerland), vol. 20, 2020.
- S. K. Zhou, H. Greenspan, C. Davatzikos, J. S. Duncan, B. van Ginneken, A. Madabhushi, J. L. Prince, D. Rueckert, and R. M. Summers, “A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises,” Proceedings of the IEEE, vol. 109, pp. 820–838, 2020.
- M. J. Willemink, W. A. Koszek, C. Hardell, J. Wu, D. Fleischmann, H. Harvey, L. R. Folio, R. M. Summers, D. Rubin, and M. P. Lungren, “Preparing medical imaging data for machine learning,” Radiology, p. 192224, 2020.
- I. E. Hamamci, S. Er, E. Simsar, A. Tezcan, A. Simsek, F. Almas, S. N. Esirgun, H. Reynaud, S. Pati, C. Blüthgen, and B. H. Menze, “Generatect: Text-guided 3d chest ct generation,” ArXiv, vol. abs/2305.16037, 2023.
- J. Gao, T. Shen, Z. Wang, W. Chen, K. Yin, D. Li, O. Litany, Z. Gojcic, and S. Fidler, “Get3d: A generative model of high quality 3d textured shapes learned from images,” ArXiv, vol. abs/2209.11163, 2022.
- R. L. Draelos, D. Dov, M. A. Mazurowski, J. Y. Lo, R. Henao, G. D. Rubin, and L. Carin, “Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes,” Medical Image Analysis, vol. 67, p. 101857, 2020.
- T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A simple framework for contrastive learning of visual representations,” in ArXiv, vol. abs/2002.05709, 2020.
- Y. Xian, B. Schiele, and Z. Akata, “Zero-shot learning-the good, the bad and the ugly,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3077–3086, 2017.
- Y. Zhang, H. Jiang, Y. Miura, C. D. Manning, and C. Langlotz, “Contrastive learning of medical visual representations from paired images and text,” in Machine Learning in Health Care, 2020.
- M. J. Willemink and P. B. Noël, “The evolution of image reconstruction for ct—from filtered back projection to artificial intelligence,” European Radiology, vol. 29, pp. 2185 – 2195, 2018.
- C. Sager, C. Janiesch, and P. Zschech, “A survey of image labelling for computer vision applications,” Journal of Business Analytics, vol. 4, pp. 91 – 110, 2021.
- A. Yan, J. McAuley, X. Lu, J. Du, E. Y. Chang, A. Gentili, and C.-N. Hsu, “Radbert: Adapting transformer-based language models to radiology,” Radiology. Artificial intelligence, vol. 4 4, p. e210258, 2022.
- S. Minaee, E. Cambria, and J. Gao, “Deep learning-based text classification: A comprehensive review,” ACM Computing Surveys, vol. 54, pp. 1 – 40, 2020.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning, 2021.
- H. Zhang, J. Y. Koh, J. Baldridge, H. Lee, and Y. Yang, “Cross-modal contrastive learning for text-to-image generation,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–842, 2021.
- A. Aljuaid and M. Anwar, “Survey of supervised learning for medical image processing,” Sn Computer Science, vol. 3, 2022.
- G. Yong, K. Jeon, D. Gil, and G. Lee, “Prompt engineering for zero‐shot and few‐shot defect detection and classification using a visual‐language pretrained model,” Computer‐Aided Civil and Infrastructure Engineering, vol. 38, pp. 1536 – 1554, 2022.
- A. Bailly, C. Blanc, É. Francis, T. Guillotin, F. Jamal, B. Wakim, and P. Roy, “Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models,” Computer Methods and Programs in Biomedicine, vol. 213, p. 106504, 2022.
- X. Zhai, A. Kolesnikov, N. Houlsby, and L. Beyer, “Scaling vision transformers,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1204–1213, 2021.
- M. Wortsman, G. Ilharco, M. Li, J. W. Kim, H. Hajishirzi, A. Farhadi, H. Namkoong, and L. Schmidt, “Robust fine-tuning of zero-shot models,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7949–7961, 2021.
- C. Chen, M. Y. Lu, D. F. K. Williamson, T. Y. Chen, A. J. Schaumberg, and F. Mahmood, “Fast and scalable search of whole-slide images via self-supervised deep learning,” Nature Biomedical Engineering, vol. 6, pp. 1420 – 1434, 2022.
- S. Zhang, M. Yang, T. Cour, K. Yu, and D. N. Metaxas, “Query specific fusion for image retrieval,” in European Conference on Computer Vision, 2012.
- J. Li, G. Zhu, C. Hua, M. Feng, B. Bennamoun, P. Li, X. Lu, J. Song, P. Shen, X. Xu, L. Mei, L. Zhang, S. A. A. Shah, and Bennamoun, “A systematic collection of medical image datasets for deep learning,” ACM Computing Surveys, 2021.
- A. Ioannidou, E. Chatzilari, S. Nikolopoulos, and I. Kompatsiaris, “Deep learning advances in computer vision with 3d data: A survey,” ACM Computing Surveys, vol. 50, pp. 1 – 38, 2017.
- S. Wang, C. Li, R. Wang, Z. Liu, M. Wang, H. Tan, Y. Wu, X. Liu, H. Sun, R. Yang, X. Liu, J. Chen, H.-C. Zhou, I. B. Ayed, and H. Zheng, “Annotation-efficient deep learning for automatic medical image segmentation,” Nature Communications, vol. 12, 2020.
- M. P. Hartung, I. C. Bickle, F. Gaillard, and J. P. Kanne, “How to create a great radiology report,” RadioGraphics, vol. 40, no. 6, pp. 1658–1670, 2020.
- C. J. Kelly, A. Karthikesalingam, M. Suleyman, G. C. Corrado, and D. King, “Key challenges for delivering clinical impact with artificial intelligence,” BMC Medicine, vol. 17, 2019.
- B. Boecking, N. Usuyama, S. Bannur, D. C. de Castro, A. Schwaighofer, S. L. Hyland, M. T. Wetscherek, T. Naumann, A. Nori, J. Alvarez-Valle, H. Poon, and O. Oktay, “Making the most of text semantics to improve biomedical vision-language processing,” in European Conference on Computer Vision, 2022.
- T. D. DenOtter and J. Schubert, “Hounsfield unit,” 2019.
- Z. Huang, F. Bianchi, M. Yuksekgonul, T. J. Montine, and J. Y. Zou, “A visual–language foundation model for pathology image analysis using medical twitter,” Nature Medicine, vol. 29, pp. 2307–2316, 2023.
- Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” ArXiv, vol. abs/1805.07836, 2018.
- G. Alain and Y. Bengio, “Understanding intermediate layers using linear classifier probes,” ArXiv, vol. abs/1610.01644, 2016.
- L. van der Maaten and G. E. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.
- R. J. Chen, T. Ding, M. Y. Lu, D. F. K. Williamson, G. Jaume, A. H. Song, B. Chen, A. Zhang, D. Shao, M. Shaban, M. Williams, L. Oldenburg, L. L. Weishaupt, J. J. Wang, A. Vaidya, L. P. Le, G. Gerber, S. Sahai, W. Williams, and F. Mahmood, “Towards a general-purpose foundation model for computational pathology.,” Nature Medicine, 2024.
- M. L. Kherfi, D. Ziou, and A. Bernardi, “Image retrieval from the world wide web: Issues, techniques, and systems,” ACM Computing Surveys, vol. 36, pp. 35–67, 2004.
- N. Hegde, J. D. Hipp, Y. Liu, M. R. Emmert-Buck, E. Reif, D. Smilkov, M. Terry, C. J. Cai, M. B. Amin, C. H. Mermel, P. Q. Nelson, L. H. Peng, G. S. Corrado, and M. C. Stumpe, “Similar image search for histopathology: Smily,” NPJ Digital Medicine, vol. 2, 2019.