Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology (2402.14252v1)
Abstract: Recent advances in AI combine LLMs with vision encoders that bring forward unprecedented technical capabilities to leverage for a wide range of healthcare applications. Focusing on the domain of radiology, vision-LLMs (VLMs) achieve good performance results for tasks such as generating radiology findings based on a patient's medical image, or answering visual questions (e.g., 'Where are the nodules in this chest X-ray?'). However, the clinical utility of potential applications of these capabilities is currently underexplored. We engaged in an iterative, multidisciplinary design process to envision clinically relevant VLM interactions, and co-designed four VLM use concepts: Draft Report Generation, Augmented Report Review, Visual Search and Querying, and Patient Imaging History Highlights. We studied these concepts with 13 radiologists and clinicians who assessed the VLM concepts as valuable, yet articulated many design considerations. Reflecting on our findings, we discuss implications for integrating VLM capabilities in radiology, and for healthcare AI more generally.
- Open AI. 2022. chatGPT. https://chat.openai.com
- Influence of prior radiologic information on the interpretation of radiographic examinations. Academic Radiology 2, 3 (1995), 205–208.
- Introduction to the Special Issue on Human-Centred AI in Healthcare: Challenges Appearing in the Wild. , 11 pages.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023).
- Chexplaining in style: Counterfactual explanations for chest x-rays using stylegan. arXiv preprint arXiv:2207.07553 (2022).
- Computational Notebooks as Co-Design Tools: Engaging Young Adults Living with Diabetes, Family Carers, and Clinicians with Machine Learning Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
- “If I Had All the Time in the World”: Ophthalmologists’ Perceptions of Anchoring Bias Mitigation in Clinical AI Support. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14.
- Learning to exploit temporal structure for biomedical vision-language processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15016–15027.
- A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–12.
- Think about the stakeholders first! Toward an algorithmic transparency playbook for regulatory compliance. Data & Policy 5 (2023), e12.
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 610–623.
- Potential sources of dataset bias complicate investigation of underdiagnosis by machine learning algorithms. Nature Medicine 28, 6 (June 2022), 1157–1158. https://doi.org/10.1038/s41591-022-01846-8
- Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography. European Radiology (2023), 1–7.
- The high stakes of outsourcing in health care. In Mayo Clinic Proceedings, Vol. 96. Elsevier, 2879–2890.
- Service blueprinting: a practical technique for service innovation. California management review 50, 3 (2008), 66–94.
- Sara Bly and Elizabeth F Churchill. 1999. Design through matchmaking: technology in search of users. interactions 6, 2 (1999), 23–31.
- Making the most of text semantics to improve biomedical vision–language processing. In European conference on computer vision. Springer, 1–21.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
- Claus Bossen and Kathleen H Pine. 2023. Batman and Robin in Healthcare Knowledge Work: Human-AI Collaboration by Clinical Documentation Integrity Specialists. ACM Transactions on Computer-Human Interaction 30, 2 (2023), 1–29.
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Healthcare AI Treatment Decision Support: Design Principles to Enhance Clinician Adoption and Trust. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19.
- Bill Buxton. 2010. Sketching user experiences: getting the design right and the right design. Morgan kaufmann.
- Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–14.
- ” Hello AI”: uncovering the onboarding needs of medical practitioners for human-AI collaborative decision-making. Proceedings of the ACM on Human-computer Interaction 3, CSCW (2019), 1–24.
- Onboarding Materials as Cross-functional Boundary Objects for Developing AI Assistants. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.
- Assertiveness-based Agent Communication for a Personalized Medicine on Medical Imaging Diagnosis. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
- Introduction of human-centric AI assistant to aid radiologists for multimodal breast image classification. International Journal of Human-Computer Studies 150 (2021), 102607.
- BreastScreening-AI: Evaluating medical intelligent agents for human-AI interactions. Artificial Intelligence in Medicine 127 (2022), 102285.
- Multi-disciplinary fairness considerations in machine learning for clinical trials. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 906–924.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
- Radiology reporting: attitudes of referring physicians. Radiology 169, 3 (1988), 825–826.
- Enrico Coiera. 2019. The last mile: where artificial intelligence meets reality. Journal of medical Internet research 21, 11 (2019), e16323.
- Eli Collins and Zoubin Ghahramani. 2021. LAMDA: Our breakthrough conversation technology. https://blog.google/technology/ai/lamda/
- A systematic review and thematic analysis of community-collaborative approaches to computing research. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–18.
- Power and Public Participation in AI. In Equity and Access in Algorithms, Mechanisms, and Optimization. 1–13.
- Greg Corrado and Yossi Matias. 2023. Multimodal Medical Ai. https://ai.googleblog.com/2023/08/multimodal-medical-ai.html
- Measuring and managing radiologist workload: Measuring radiologist reporting times using data from a R adiology I nformation S ystem. Journal of medical imaging and radiation oncology 57, 5 (2013), 558–566.
- Rikke Friis Dam and Teo Yu Siang. 2022. Affinity diagrams: How to cluster your ideas and reveal insights. https://www.interaction-design.org/literature/article/affinity-diagrams-learn-how-to-cluster-and-bundle-ideas-and-facts
- Stakeholder Participation in AI: Beyond” Add Diverse Stakeholders and Stir”. arXiv preprint arXiv:2111.01122 (2021).
- The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization. 1–23.
- Investigating Practices and Opportunities for Cross-functional Collaboration around AI Fairness in Industry Practice. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 705–716.
- How Do UX Practitioners Communicate AI as a Design Material? Artifacts, Conceptions, and Propositions. In Proceedings of the 2023 ACM Designing Interactive Systems Conference. 2263–2280.
- Figma. 2023. Figma: the collaborative interface design tool. https://www.figma.com/
- Artificial intelligence and multidisciplinary team meetings; a communication challenge for radiologists’ sense of agency and position as spider in a web? European Journal of Radiology 155 (2022), 110231.
- Structured reporting in radiology. Academic radiology 25, 1 (2018), 66–73.
- Framing Machine Learning Opportunities for Hypotension Prediction in Perioperative Care: A Socio-Technical Perspective. ACM Transactions on Computer-Human Interaction (2023).
- Large language model AI chatbots require approval as medical devices. Nature Medicine (2023), 1–3.
- Google. 2023. Bard - Chat Based AI Tool from Google, Powered by PaLM 2. https://bard.google.com/
- Improving workflow integration with XPath: Design and evaluation of a human-AI diagnosis system in pathology. ACM Transactions on Computer-Human Interaction 30, 2 (2023), 1–37.
- Augmenting Pathologists with NaviPath: Design and Evaluation of a Human-AI Collaborative Navigation System. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19.
- Distilling large language models for biomedical knowledge extraction: A case study on adverse drug events. arXiv preprint arXiv:2307.06439 (2023).
- Human–machine teaming is key to AI adoption: clinicians’ experiences with a deployed machine learning system. NPJ digital medicine 5, 1 (2022), 97.
- Designing contestability: Interaction design, machine learning, and mental health. In Proceedings of the 2017 Conference on Designing Interactive Systems. 95–99.
- ” It’s hard to argue with a computer” Investigating Psychotherapists’ Attitudes towards Automated Evaluation. In Proceedings of the 2018 Designing Interactive Systems Conference. 559–571.
- Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–16.
- Karen Holtzblatt and Hugh Beyer. 2014. Field research: data collection and interpretation. In Contextual Design: Evolved. Springer, 11–20.
- Generative Artificial Intelligence for Chest Radiograph Interpretation in the Emergency Department. JAMA network open 6, 10 (2023), e2336100–e2336100.
- MAIRA-1: A specialised large multimodal model for radiology report generation. arXiv preprint arXiv: 2311.13668 (2023).
- Bridging disconnected knowledges for community health. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 1–27.
- Designing AI for trust and collaboration in time-constrained medical decisions: a sociotechnical lens. In Proceedings of the 2021 chi conference on human factors in computing systems. 1–14.
- Chatgpt makes medicine easy to swallow: An exploratory case study on simplified radiology reports. arXiv preprint arXiv:2212.14882 (2022).
- Saurabh Jha and Eric J Topol. 2016. Adapting to artificial intelligence: radiologists and pathologists as information specialists. Jama 316, 22 (2016), 2353–2354.
- Promptmaker: Prompt-based prototyping with large language models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–8.
- MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data 6, 1 (2019), 317.
- Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221 (2022).
- Toward best practices in radiology reporting. Radiology 252, 3 (2009), 852–856.
- Holtzblatt Karen and Jones Sandra. 2017. Contextual inquiry: A participatory technique for system design. In Participatory design. CRC Press, 177–210.
- Generating SOAP notes from doctor-patient conversations using modular summarization techniques. arXiv preprint arXiv:2005.01795 (2020).
- Sean Kross and Philip Guo. 2021. Orienting, framing, bridging, magic, and counseling: How data scientists navigate the outer loop of client collaborations in industry and academia. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–28.
- Understanding Frontline Workers’ and Unhoused Individuals’ Perspectives on AI Used in Homeless Services. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17.
- Curtis P. Langlots. 2015. The radiology report: a guide to thoughtful communication for radiologists and other medical professionals. Springer.
- Curtis P Langlotz. 2019. Will artificial intelligence replace radiologists? , e190058 pages.
- Robert Law. 2014. Radiographers,‘never events’ and the nasogastric tube. Radiography 20, 1 (2014), 2–3.
- Revolutionizing radiology with GPT-based models: Current applications, future possibilities and limitations of ChatGPT. Diagnostic and Interventional Imaging 104, 6 (2023), 269–274.
- Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. New England Journal of Medicine 388, 13 (2023), 1233–1239.
- Holistic evaluation of language models. arXiv preprint arXiv:2211.09110 (2022).
- Designerly understanding: Information needs for model transparency to support design ideation for AI-powered user experience. In Proceedings of the 2023 CHI conference on human factors in computing systems. 1–21.
- Q Vera Liao and Jennifer Wortman Vaughan. 2023. AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap. arXiv preprint arXiv:2306.01941 (2023).
- Connecting Algorithmic Research and Usage Contexts: A Perspective of Contextualized Evaluation for Explainable AI. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 10. 147–159.
- Rapid assisted visual search: Supporting digital pathologists with imperfect AI. In 26th International Conference on Intelligent User Interfaces. 504–513.
- Human-centered NLP Fact-checking: Co-Designing with Fact-checkers using Matchmaking for AI. arXiv preprint arXiv:2308.07213 (2023).
- Exploring the Boundaries of GPT-4 in Radiology. arXiv preprint arXiv:2310.14573 (2023).
- Re-Embodiment and Co-Embodiment: Exploration of social presence for robots and conversational agents. In Proceedings of the 2019 on Designing Interactive Systems Conference. 633–644.
- ImpressionGPT: an iterative optimizing framework for radiology report summarization with chatGPT. arXiv preprint arXiv:2304.08448 (2023).
- Universal methods of design: 100 ways to research complex problems. Develop Innovative Ideas, and Design Effective Solutions (2012), 12–13.
- Clinician preimplementation perspectives of a decision-support tool for the prediction of cardiac arrhythmia based on machine learning: near-live feasibility and qualitative study. JMIR human factors 8, 4 (2021), e26964.
- Microsoft. 2023. Microsoft Copilot: Your everyday AI companion. https://copilot.microsoft.com/
- Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences. arXiv preprint arXiv:1712.00547 (2017).
- Foundation models for generalist medical artificial intelligence. Nature 616, 7956 (2023), 259–265.
- The design space of generative models. arXiv preprint arXiv:2304.10547 (2023).
- Social Sensemaking with AI: Designing an Open-ended AI experience with a Blind Child. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14.
- Nabla. 2023. Nabla Copilot · Enjoy care again. https://www.nabla.com/ [Accessed 11-08-2023].
- Radiology reports: examining radiologist and clinician preferences regarding style and content. American Journal of Roentgenology 176, 3 (2001), 591–598.
- Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375 (2023).
- Nuance-Microsoft. 2023. Nuance and Microsoft Announce the First Fully AI-Automated Clinical Documentation Application for Healthcare — news.nuance.com. https://news.nuance.com/2023-03-20-Nuance-and-Microsoft-Announce-the-First-Fully-AI-Automated-Clinical-Documentation-Application-for-Healthcare. [Accessed 11-08-2023].
- ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models. Annals of Surgical Treatment and Research 104, 5 (2023), 269.
- PAIRADS: Hybrid Interaction Between Humans and AI in Radiology. In HHAI 2023: Augmenting Human Intellect. IOS Press, 395–397.
- Exploring human-centered AI in healthcare: diagnosis, explainability, and trust. (2022).
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Realizing AI in healthcare: challenges appearing in the wild. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–5.
- Human–machine partnership with artificial intelligence for chest radiograph diagnosis. NPJ digital medicine 2, 1 (2019), 111.
- Responsible and regulatory conform machine learning for medicine: a survey of challenges and solutions. IEEE Access 10 (2022), 58375–58418.
- PromptInfuser: Bringing User Interface Mock-ups to Life with Large Language Models. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–6.
- How ai developers overcome communication challenges in a multidisciplinary team: A case study. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1–25.
- Toward structuring real-world data: Deep learning for extracting oncology information from clinical text with patient-level supervision. Patterns 4, 4 (2023).
- Holding AI to account: Challenges for the delivery of trustworthy AI in healthcare. ACM Transactions on Computer-Human Interaction 30, 2 (2023), 1–34.
- Not Some Random Agent: Multi-person interaction with a personalizing service robot. In Proceedings of the 2020 ACM/IEEE international conference on human-robot interaction. 289–297.
- Abi Rimmer. 2017. Radiologist shortage leaves patient care at risk, warns royal college. BMJ: British Medical Journal (Online) 359 (2017).
- Samantha Robertson and Niloufar Salehi. 2020. What If I Don’t Like Any Of The Choices? The Limits of Preference Elicitation for Participatory Algorithm Design. arXiv preprint arXiv:2007.06718 (2020).
- Administrative Simplification: How to Save a Quarter-Trillion Dollars in US Healthcare. McKinsey & Company. October 20, 2021.
- Alexandra Sasha Luccioni and Anna Rogers. 2023. Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice. arXiv e-prints (2023), arXiv–2308.
- Sectra. 2013. How radiology can improve communication with referring physicians. https://sectraprodstorage01.blob.core.windows.net/medical-uploads/2017/09/report-how-radiology-can-improve-communication-with-referring-physicians.pdf [Accessed 11-22-2023].
- Acceptability of healthcare interventions: an overview of reviews and development of a theoretical framework. BMC health services research 17, 1 (2017), 1–13.
- ” The human body is a black box” supporting clinical decision-making with deep learning. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 99–109.
- Murray Shanahan. 2022. Talking About Large Language Models. arXiv preprint arXiv:2212.03551 (2022).
- ACR practice guideline for communication of diagnostic imaging findings. American College of Radiology (2022).
- Julia Simkus. 2023. Snowball sampling method: Definition, Techniques & Examples. https://www.simplypsychology.org/snowball-sampling.html
- Large Language Models Encode Clinical Knowledge. arXiv preprint arXiv:2212.13138 (2022).
- Large language models encode clinical knowledge. Nature (2023), 1–9.
- Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617 (2023).
- Implementation of artificial intelligence (AI) applications in radiology: hindering and facilitating factors. European radiology 30 (2020), 5525–5532.
- Solving separation-of-concerns problems in collaborative design of human-AI systems through leaky abstractions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–21.
- Machine learning in mental health: A systematic review of the HCI literature to support the development of effective and implementable ML systems. ACM Transactions on Computer-Human Interaction (TOCHI) 27, 5 (2020), 1–53.
- Interpretability as a dynamic of human-AI interaction. Interactions 27, 5 (2020), 40–45.
- Designing human-centered AI for mental health: Developing clinically relevant applications for online CBT treatment. ACM Transactions on Computer-Human Interaction 30, 2 (2023), 1–50.
- Foundation Models in Healthcare: Opportunities, Risks & Strategies Forward. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–4.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
- Towards generalist biomedical ai. arXiv preprint arXiv:2307.14334 (2023).
- Towards Generalist Biomedical AI. arXiv:2307.14334 [cs.CL]
- Inclusion of clinicians in the development and evaluation of clinical artificial intelligence tools: a systematic literature review. Frontiers in Psychology 13 (2022), 830345.
- “The less I type, the better”: How AI Language Models can Enhance or Impede Communication for AAC Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14.
- On improving physicians’ trust in AI: Qualitative inquiry with imaging experts in the oncological domain. BMC Medical Imaging, in review (2021).
- Designing Responsible AI: Adaptations of UX Practice to Meet Responsible AI Challenges. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–16.
- AI Consent Futures: A Case Study on Voice Data Collection with Clinicians. (2023).
- Malwina Anna Wójcik. 2022. Foundation Models in Healthcare: Opportunities, Biases and Regulatory Prospects in Europe. In International Conference on Electronic Government and the Information Systems Perspective. Springer, 32–46.
- CheXplain: enabling physicians to explore and understand data-driven, AI-enabled medical imaging analysis. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
- ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders. arXiv preprint arXiv:2308.01317 (2023).
- Sketching nlp: A case study of exploring the right things to design with language intelligence. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
- Harnessing biomedical literature to calibrate clinicians’ trust in AI decision support systems. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14.
- Re-examining whether, why, and how human-AI interaction is uniquely difficult to design. In Proceedings of the 2020 chi conference on human factors in computing systems. 1–13.
- Unremarkable AI: Fitting intelligent decision support into critical, clinical decision-making processes. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–11.
- How Experienced Designers of Enterprise Applications Engage AI as a Design Material. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–13.
- Creating Design Resources to Scaffold the Ideation of AI Concepts. In Proceedings of the 2023 ACM Designing Interactive Systems Conference. 2326–2346.
- Investigating How Practitioners Use Human-AI Guidelines: A Case Study on the People+ AI Guidebook. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–13.
- Technical Feasibility, Financial Viability, and Clinician Acceptance: On the Many Challenges to AI in Clinical Practice.. In HUMAN@ AAAI Fall Symposium.
- Sketching AI Concepts with Capabilities and Examples: AI Innovation in the Intensive Care Unit. arXiv:2402.13437 [cs.HC]
- Investigating Why Clinicians Deviate from Standards of Care: Liberating Patients from Mechanical Ventilation in the ICU. arXiv:2402.13464 [cs.HC]
- Evaluating progress in automatic chest X-ray radiology report generation. Patterns 4, 9 (2023). https://doi.org/10.1016/j.patter.2023.100802
- Artificial intelligence in healthcare. Nature biomedical engineering 2, 10 (2018), 719–731.
- Clinician-facing AI in the Wild: Taking Stock of the Sociotechnical Challenges and Opportunities for HCI. ACM Transactions on Computer-Human Interaction 30, 2 (2023), 1–39.
- Deliberating with AI: Improving Decision-Making for the Future through Participatory AI Design and Stakeholder Deliberation. Proceedings of the ACM on Human-Computer Interaction 7, CSCW1 (2023), 1–32.
- Research through design as a method for interaction design research in HCI. In Proceedings of the SIGCHI conference on Human factors in computing systems. 493–502.