Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys (2310.10765v3)

Published 16 Oct 2023 in cs.CV, cs.AI, and cs.CL

Abstract: Rapid progress has been made in instruction-learning for image editing with natural-language instruction, as exemplified by InstructPix2Pix. In biomedicine, such methods can be applied to counterfactual image generation, which helps differentiate causal structure from spurious correlation and facilitate robust image interpretation for disease progression modeling. However, generic image-editing models are ill-suited for the biomedical domain, and counterfactual biomedical image generation is largely underexplored. In this paper, we present BiomedJourney, a novel method for counterfactual biomedical image generation by instruction-learning from multimodal patient journeys. Given a patient with two biomedical images taken at different time points, we use GPT-4 to process the corresponding imaging reports and generate a natural language description of disease progression. The resulting triples (prior image, progression description, new image) are then used to train a latent diffusion model for counterfactual biomedical image generation. Given the relative scarcity of image time series data, we introduce a two-stage curriculum that first pretrains the denoising network using the much more abundant single image-report pairs (with dummy prior image), and then continues training using the counterfactual triples. Experiments using the standard MIMIC-CXR dataset demonstrate the promise of our method. In a comprehensive battery of tests on counterfactual medical image generation, BiomedJourney substantially outperforms prior state-of-the-art methods in instruction image editing and medical image generation such as InstructPix2Pix and RoentGen. To facilitate future study in counterfactual medical generation, we plan to release our instruction-learning code and pretrained models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Chexplaining in style: Counterfactual explanations for chest x-rays using stylegan. arXiv preprint arXiv:2207.07553, 2022.
  2. Image segmentation, registration and characterization in r with simpleitk. Journal of Statistical Software, 86(8):1–35, 2018. doi: 10.18637/jss.v086.i08. URL https://www.jstatsoft.org/index.php/jss/article/view/v086i08.
  3. Dreamr: Diffusion-driven counterfactual explanation for functional mri. arXiv preprint arXiv:2307.09547, 2023.
  4. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  18392–18402, 2023.
  5. Roentgen: Vision-language foundation model for chest x-ray generation, 2022a.
  6. Adapting pretrained vision-language foundational models to medical imaging domains. arXiv preprint arXiv:2210.04133, 2022b.
  7. Gifsplanation via latent shift: a simple autoencoder approach to counterfactual generation for chest x-rays. In Medical Imaging with Deep Learning, pp.  74–104. PMLR, 2021.
  8. End-to-end adversarial retinal image synthesis. IEEE transactions on medical imaging, 37(3):781–791, 2017.
  9. Conditional diffusion models for semantic 3d medical image synthesis. arXiv preprint arXiv:2305.18453, 2023.
  10. Acat: Adversarial counterfactual attention for classification and detection in medical imaging. arXiv preprint arXiv:2303.15421, 2023.
  11. Ai recognition of patient race in medical imaging: a modelling study. The Lancet Digital Health, 2022. doi: 10.1016/S2589-7500(22)00063-2. URL https://www.thelancet.com/journals/landig/article/PIIS2589-7500(22)00063-2/fulltext.
  12. Generative adversarial networks, 2014.
  13. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23, 2021.
  14. Prompt-to-prompt image editing with cross attention control, 2022.
  15. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations, 2016.
  16. Denoising diffusion probabilistic models, 2020.
  17. Composer: Creative and controllable image synthesis with composable conditions, 2023.
  18. Hirotaka et al. Ieki. Deep learning-based age estimation from chest X-rays indicates cardiovascular prognosis. Communications Medicine, 2022. doi: 10.1038/s43856-022-00220-6. URL https://www.nature.com/articles/s43856-022-00220-6.
  19. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison, 2019.
  20. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data, 6(1):317, 2019. doi: 10.1038/s41597-019-0322-0.
  21. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4401–4410, 2019.
  22. Imagic: Text-based real image editing with diffusion models, 2023.
  23. Denoising diffusion probabilistic models for 3d medical image generation. Scientific Reports, 13(1):7303, 2023.
  24. Visual interpretation of convolutional neural network predictions in classifying medical image modalities. Diagnostics, 9(2):38, 2019.
  25. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
  26. Unified chest x-ray and radiology report generation model with multi-view chest x-rays. arXiv preprint arXiv:2302.12172, 2023a.
  27. Llm itself can read and generate cxr images. arXiv preprint arXiv:2305.11490, 2023b.
  28. Domain aware medical image classifier interpretation by counterfactual impact analysis. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pp.  315–325. Springer, 2020.
  29. Gligen: Open-set grounded text-to-image generation, 2023.
  30. A Structure-Aware Relation Network for Thoracic Diseases Detection and Segmentation. IEEE Transactions on Medical Imaging, 2021. doi: 10.48550/arxiv.2104.10326. URL https://arxiv.org/abs/2104.10326.
  31. Decoupled weight decay regularization, 2019.
  32. Chest x-ray generation and data augmentation for cardiovascular abnormality classification. In Medical imaging 2018: Image processing, volume 10574, pp. 415–420. SPIE, 2018a.
  33. Semi-supervised learning with generative adversarial networks for chest x-ray classification with ability of data domain adaptation. In 2018 IEEE 15th International symposium on biomedical imaging (ISBI 2018), pp.  1038–1042. IEEE, 2018b.
  34. Interpreting medical image classifiers by optimization based counterfactual impact analysis. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), pp.  1096–1100. IEEE, 2020.
  35. Least squares generative adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp.  2794–2802, 2017.
  36. Sdedit: Guided image synthesis and editing with stochastic differential equations, 2022.
  37. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models, 2023.
  38. Improved denoising diffusion probabilistic models, 2021.
  39. Generation of anonymous chest radiographs using latent diffusion models for training thoracic abnormality classification systems. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pp.  1–5. IEEE, 2023.
  40. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
  41. High-resolution image synthesis with latent diffusion models, 2022.
  42. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp.  234–241. Springer, 2015.
  43. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. 2022.
  44. Diffusion causal models for counterfactual estimation. In Conference on Causal Learning and Reasoning, pp. 647–668. PMLR, 2022.
  45. Explicit temporal embedding in deep generative latent models for longitudinal medical image synthesis. arXiv preprint arXiv:2301.05465, 2023.
  46. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp.  618–626, 2017.
  47. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
  48. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3, pp. 240–248. Springer, 2017.
  49. Training calibration-based counterfactual explainers for deep learning models in medical image analysis. Scientific reports, 12(1):597, 2022.
  50. Lr-gan: Layered recursive generative adversarial networks for image generation. arXiv preprint arXiv:1703.01560, 2017.
  51. Reco: Region-controlled text-to-image generation, 2022.
  52. Generative adversarial network in medical imaging: A review. Medical image analysis, 58:101552, 2019.
  53. Time-series generative adversarial networks. Advances in neural information processing systems, 32, 2019.
  54. Adding conditional control to text-to-image diffusion models, 2023a.
  55. Large-scale domain-specific pretraining for biomedical vision-language processing. arXiv preprint arXiv:2303.00915, 2023b.
  56. Skrgan: Sketching-rendering unconditional generative adversarial networks for medical image synthesis. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part IV 22, pp.  777–785. Springer, 2019.
  57. Inversion-based style transfer with diffusion models, 2023c.
  58. Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern Recognition, pp.  9242–9251, 2018.
  59. Synthesizing retinal and neuronal images with generative adversarial nets. Medical image analysis, 49:14–26, 2018.
  60. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  2921–2929, 2016.
Citations (16)

Summary

We haven't generated a summary for this paper yet.