End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding (2311.15876v4)
Abstract: Recent advances in AI foundation models have significant potential for lightening the clinical workload by mimicking the comprehensive and multi-faceted approaches used by medical professionals. In the field of radiation oncology, the integration of multiple modalities holds great importance, so the opportunity of foundational model is abundant. Inspired by this, here we present RO-LMM, a multi-purpose, comprehensive large multimodal model (LMM) tailored for the field of radiation oncology. This model effectively manages a series of tasks within the clinical workflow, including clinical context summarization, radiation treatment plan suggestion, and plan-guided target volume segmentation by leveraging the capabilities of LMM. In particular, to perform consecutive clinical tasks without error accumulation, we present a novel Consistency Embedding Fine-Tuning (CEFTune) technique, which boosts LMM's robustness to noisy inputs while preserving the consistency of handling clean inputs. We further extend this concept to LMM-driven segmentation framework, leading to a novel Consistency Embedding Segmentation (CESEG) techniques. Experimental results including multi-centre validation confirm that our RO-LMM with CEFTune and CESEG results in promising performance for multiple clinical tasks with generalization capabilities.
- Better fine-tuning by reducing representational collapse. arXiv preprint arXiv:2008.03156, 2020.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2023.
- Clinical evaluation of atlas-and deep learning-based automatic segmentation of multiple organs and clinical target volumes for breast cancer. Radiotherapy and Oncology, 153:139–145, 2020.
- Clinical feasibility of deep learning-based auto-segmentation of target volumes and organs-at-risk in breast cancer patients after breast-conserving surgery. Radiation Oncology, 16(1):1–10, 2021.
- 3d u-net: learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, pages 424–432. Springer, 2016.
- Free dolly: Introducing the world’s first truly open instruction-tuned llm, 2023.
- Generalized overlap measures for evaluation and validation in medical image analysis. IEEE transactions on medical imaging, 25(11):1451–1461, 2006.
- Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023.
- Open-vocabulary universal image segmentation with maskclip, 2023.
- Palm-e: An embodied multimodal language model, 2023.
- Gptscore: Evaluate as you desire. arXiv preprint arXiv:2302.04166, 2023.
- Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
- Artificial intelligence in radiology. Nature Reviews Cancer, 18(8):500–510, 2018.
- Lora: Low-rank adaptation of large language models, 2021.
- Noise stability regularization for improving bert fine-tuning. arXiv preprint arXiv:2107.04835, 2021.
- Contextual net: A multimodal vision-language model for segmentation of pneumothorax. arXiv preprint arXiv:2303.01615, 2023.
- Neftune: Noisy embeddings improve instruction finetuning. arXiv preprint arXiv:2310.05914, 2023.
- Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv preprint arXiv:1911.03437, 2019.
- Fda approved artificial intelligence and machine learning (ai/ml)-enabled medical devices: An updated 2022 landscape. medRxiv, pages 2022–12, 2022.
- Zegot: Zero-shot segmentation through optimal transport of text prompts. arXiv preprint arXiv:2301.12171, 2023.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Publicly shareable clinical large language model built on synthetic clinical notes. arXiv preprint arXiv:2309.00237, 2023.
- Lisa: Reasoning segmentation via large language model. arXiv preprint arXiv:2308.00692, 2023.
- Language-driven semantic segmentation. arXiv preprint arXiv:2201.03546, 2022.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models, 2023a.
- Lvit: Language meets vision transformer in medical image segmentation. IEEE Transactions on Medical Imaging, pages 1–1, 2023b.
- Open-vocabulary semantic segmentation with mask-adapted clip, 2023.
- Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
- Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023.
- G-eval: Nlg evaluation using gpt-4 with better human alignment, may 2023. arXiv preprint arXiv:2303.16634.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Foundation models for generalist medical artificial intelligence. Nature, 616(7956):259–265, 2023.
- Llm-driven multimodal target volume contouring in radiation oncology, 2023.
- OpenAI. Chatgpt. OpenAI Blog, 2021.
- OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
- Training language models to follow instructions with human feedback, 2022. URL https://arxiv. org/abs/2203.02155, 13, 2022.
- The current and future state of ai interpretation of medical images. New England Journal of Medicine, 388(21):1981–1990, 2023.
- Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
- Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022.
- Stanford alpaca: An instruction-following llama model, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023c.
- Towards generalist biomedical ai. arXiv preprint arXiv:2307.14334, 2023.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022a.
- Cris: Clip-driven referring image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11686–11695, 2022b.
- Pmc-llama: Further finetuning llama on medical papers. arXiv preprint arXiv:2304.14454, 2023a.
- Towards generalist foundation model for radiology. arXiv preprint arXiv:2308.02463, 2023b.
- Bartscore: Evaluating generated text as text generation. Advances in Neural Information Processing Systems, 34:27263–27277, 2021.
- Ifseg: Image-free semantic segmentation via vision-language model, 2023.
- Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv preprint arXiv:2303.14070, 2023.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
- Moverscore: Text generation evaluating with contextualized embeddings and earth mover distance. arXiv preprint arXiv:1909.02622, 2019.
- Conditional prompt learning for vision-language models, 2022.
- Freelb: Enhanced adversarial training for natural language understanding. arXiv preprint arXiv:1909.11764, 2019.