Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography (2403.17834v3)

Published 26 Mar 2024 in cs.CV

Abstract: While computer vision has achieved tremendous success with multimodal encoding and direct textual interaction with images via chat-based LLMs, similar advancements in medical imaging AI, particularly in 3D imaging, have been limited due to the scarcity of comprehensive datasets. To address this critical gap, we introduce CT-RATE, the first dataset that pairs 3D medical images with corresponding textual reports. CT-RATE comprises 25,692 non-contrast 3D chest CT scans from 21,304 unique patients. Through various reconstructions, these scans are expanded to 50,188 volumes, totaling over 14.3 million 2D slices. Each scan is accompanied by its corresponding radiology report. Leveraging CT-RATE, we develop CT-CLIP, a CT-focused contrastive language-image pretraining framework designed for broad applications without the need for task-specific training. We demonstrate how CT-CLIP can be used in two tasks: multi-abnormality detection and case retrieval. Remarkably, in multi-abnormality detection, CT-CLIP outperforms state-of-the-art fully supervised models across all key metrics, effectively eliminating the need for manual annotation. In case retrieval, it efficiently retrieves relevant cases using either image or textual queries, thereby enhancing knowledge dissemination. By combining CT-CLIP's vision encoder with a pretrained LLM, we create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes. Finetuned on over 2.7 million question-answer pairs derived from the CT-RATE dataset, CT-CHAT surpasses other multimodal AI assistants, underscoring the necessity for specialized methods in 3D medical imaging. Collectively, the open-source release of CT-RATE, CT-CLIP, and CT-CHAT not only addresses critical challenges in 3D medical imaging, but also lays the groundwork for future innovations in medical AI and improved patient care.

References (49)

Citations (17)

View on Semantic Scholar

Summary

The paper introduces CT-RATE, a pioneering dataset combining 3D chest CT volumes with radiology reports.
It presents CT-CLIP, a contrastive pre-training framework that excels in zero-shot multi-abnormality detection.
The work reduces manual annotation efforts and enhances case retrieval, setting a new benchmark in medical imaging AI.

Leveraging CT-RATE and CT-CLIP for Advanced Multi-Abnormality Detection in Chest CT Volumes

Introduction to CT-RATE and CT-CLIP

The paper introduces CT-RATE, a pioneering dataset pairing non-contrast chest CT volumes with corresponding radiological text reports, and CT-CLIP, a contrastive language-image pre-training framework optimized for this dataset. CT-RATE encompasses 25,692 CT volumes (expanded to 50,188 through reconstructions) from 21,304 unique patients. CT-CLIP, leveraging the dataset, establishes a new benchmark in multi-abnormality detection in chest CT scans by outperforming fully supervised methods without necessitating manual annotation.

CT-RATE Dataset: A Novel Resource

CT-RATE stands as the first comprehensive 3D medical imaging dataset that merges images with textual radiology reports. The significance of this dataset lies in its ability to facilitate the training of more sophisticated models capable of understanding the complex interplay between visual features and textual descriptions in medical imaging. This advancement addresses a critical gap in available datasets for computational research in 3D medical imaging.

CT-CLIP: Setting New Standards

CT-CLIP, developed on the CT-RATE dataset, demonstrates remarkable capabilities in zero-shot multi-abnormality detection. The model achieves superior performance across all key metrics when compared to state-of-the-art, fully supervised methods. Its achievements can be summarized as follows:

Outperforms fully supervised approaches in multi-abnormality detection without requiring task-specific training.
Shows utility in case retrieval for both imagery and textual queries, thus promoting a more efficient dissemination of medical knowledge.
The open-source availability of CT-CLIP and the CT-RATE dataset is poised to significantly advance medical AI by enhancing the analysis of 3D imaging and fostering innovation in healthcare applications.

Implications and Future Directions

The development of CT-CLIP and the presentation of the CT-RATE dataset have several important implications:

Reduction of Manual Annotation Effort: The ability of CT-CLIP to exceed the performance of supervised methods in detecting multiple abnormalities demonstrates a critical step towards reducing the reliance on labor-intensive manual annotations in medical imaging.
Advancement in Case Retrieval: The efficacy of CT-CLIP in retrieving cases using both image-based and text-based queries can significantly expedite the review of relevant past cases, potentially improving diagnostic accuracy and patient care.
Foundation for Future Research: The release of CT-RATE is set to catalyze further research in medical imaging analysis. Future work could explore extending CT-CLIP's capabilities to other imaging modalities or to more granular abnormality detection and classification tasks.

Conclusion

The introduction of CT-RATE and CT-CLIP represents a significant step forward in the computational analysis of medical imaging, specifically in the context of chest CT scans. By achieving unprecedented performance in multi-abnormality detection and facilitating efficient case retrieval, this work sets a new benchmark in the field. Looking ahead, the potential applications of this research in improving diagnostic workflows and patient outcomes are vast, with the open-source release ensuring broad accessibility and encouraging continued innovation in medical AI research.

Tweets

https://twitter.com/menze_group/status/1773448630814314883

https://twitter.com/menze_group/status/1851956966173691936