RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance (2311.18681v1)
Abstract: Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-LLM for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a LLM while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems. Our code is available on github: https://github.com/ChantalMP/RaDialog.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72, 2005.
- Learning to exploit temporal structure for biomedical vision-language processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15016–15027, 2023.
- Baselines for Chest X-Ray Report Generation. In Proceedings of the Machine Learning for Health NeurIPS Workshop, pages 126–140. PMLR, 2020.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Generating radiology reports via memory-driven transformer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1439–1449, 2020.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, 2023.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023.
- Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2):304–310, 2016.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- William Falcon and The PyTorch Lightning team. PyTorch Lightning, 2019.
- Evidence-based guideline for the written radiology report: Methods, recommendations and implementation challenges. Journal of medical imaging and radiation oncology, 57(1):1–7, 2013.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Kiut: Knowledge-injected u-transformer for radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19809–19818, 2023.
- Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI conference on artificial intelligence, pages 590–597, 2019.
- Multimodal image-text matching improves retrieval-based chest x-ray report generation. arXiv preprint arXiv:2303.17579, 2023.
- Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019.
- Methods for automatic generation of radiological reports of chest radiographs: a comprehensive survey. Multimedia Tools and Applications, 81(10):13409–13439, 2022.
- Explaining chest x-ray pathologies in natural language. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 701–713. Springer, 2022.
- Flexr: Few-shot classification with language embeddings for structured reporting of chest x-rays. In Medical Imaging with Deep Learning, 2023.
- Llava-med: Training a large language-and-vision assistant for biomedicine in one day. arXiv preprint arXiv:2306.00890, 2023a.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023b.
- Dynamic graph enhanced contrastive learning for chest x-ray report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3334–3343, 2023c.
- Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge. Cureus, 15(6), 2023d.
- Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, 2004. Association for Computational Linguistics.
- Exploring and distilling posterior and prior knowledge for radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13753–13762, 2021.
- Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744, 2023a.
- Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023b.
- An empirical study of catastrophic forgetting in large language models during continual fine-tuning. arXiv preprint arXiv:2308.08747, 2023.
- Improving factual completeness and consistency of image-to-text radiology report generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5288–5304, Online, 2021. Association for Computational Linguistics.
- Med-flamingo: a multimodal medical few-shot learner. arXiv preprint arXiv:2307.15189, 2023.
- Progressive transformer-based generation of radiology reports. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2824–2832, 2021.
- OpenAI. Gpt-4 technical report, 2023.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
- Rad-restruct: A novel vqa benchmark and method for structured radiology reporting. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 409–419. Springer, 2023.
- Inspecting state of the art performance and nlp metrics in image-based medical report generation. arXiv preprint arXiv:2011.09257, 2020.
- Clinically correct report generation from chest x-rays using templates. In Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 12, pages 654–663. Springer, 2021.
- Abi Rimmer. Radiologist shortage leaves patient care at risk, warns royal college. BMJ: British Medical Journal (Online), 359, 2017.
- ANTHONY ROBINS. Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Science, 7(2):123–146, 1995.
- Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023a.
- Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617, 2023b.
- Chexbert: combining automatic labelers and expert annotations for accurate radiology report labeling using bert. arXiv preprint arXiv:2004.09167, 2020.
- Chest x-ray report generation through fine-grained label learning. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, pages 561–571, 2020.
- Interactive and explainable region-guided radiology report generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7433–7442, 2023.
- Clinical camel: An open-source expert-level medical language model with dialogue-based knowledge encoding. arXiv preprint arXiv:2305.12031, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Towards generalist biomedical ai. arXiv preprint arXiv:2307.14334, 2023.
- An inclusive task-aware framework for radiology report generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 568–577. Springer, 2022.
- Metransformer: Radiology report generation by transformer with multiple learnable expert tokens. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11558–11567, 2023.
- Elixr: Towards a general purpose x-ray artificial intelligence system through alignment of large language models and radiology vision encoders. arXiv preprint arXiv:2308.01317, 2023.
- Weakly supervised contrastive learning for chest x-ray report generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4009–4015, 2021.
- Evaluating progress in automatic chest x-ray radiology report generation. Patterns, 4(9), 2023.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
- Chantal Pellegrini (15 papers)
- Ege Özsoy (19 papers)
- Benjamin Busam (82 papers)
- Nassir Navab (459 papers)
- Matthias Keicher (25 papers)