Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 178 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 41 tok/s Pro
GPT-4o 88 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation (2403.20253v2)

Published 29 Mar 2024 in cs.CV and cs.LG

Abstract: Medical image segmentation of anatomical structures and pathology is crucial in modern clinical diagnosis, disease study, and treatment planning. To date, great progress has been made in deep learning-based segmentation techniques, but most methods still lack data efficiency, generalizability, and interactability. Consequently, the development of new, precise segmentation methods that demand fewer labeled datasets is of utmost importance in medical image analysis. Recently, the emergence of foundation models, such as CLIP and Segment-Anything-Model (SAM), with comprehensive cross-domain representation opened the door for interactive and universal image segmentation. However, exploration of these models for data-efficient medical image segmentation is still limited, but is highly necessary. In this paper, we propose a novel framework, called MedCLIP-SAM that combines CLIP and SAM models to generate segmentation of clinical scans using text prompts in both zero-shot and weakly supervised settings. To achieve this, we employed a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss to fine-tune the BiomedCLIP model and the recent gScoreCAM to generate prompts to obtain segmentation masks from SAM in a zero-shot setting. Additionally, we explored the use of zero-shot segmentation labels in a weakly supervised paradigm to improve the segmentation quality further. By extensively testing three diverse segmentation tasks and medical image modalities (breast tumor ultrasound, brain tumor MRI, and lung X-ray), our proposed framework has demonstrated excellent accuracy. Code is available at https://github.com/HealthX-Lab/MedCLIP-SAM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Citations (3)

Summary

  • The paper presents the MedCLIP-SAM framework that integrates CLIP and SAM models for text-guided segmentation in both zero-shot and weakly supervised settings.
  • It introduces the unique DHN-NCE loss to optimize BiomedCLIP fine-tuning, improving performance on sparse, small-batch medical datasets.
  • Experimental results across breast ultrasound, brain MRI, and lung X-ray modalities show superior metrics like IoU, DSC, and AUC compared to baselines.

Introduction

Medical image segmentation is a critical aspect of modern clinical practices, aiding in diagnosis, paper, and treatment planning. Traditional deep learning-based segmentation models have advanced the field, yet struggle with issues of data efficiency, generalizability, and interactivity. Addressing these challenges, foundation models such as CLIP and SAM offer promising cross-domain representational capabilities, though their application to medical imaging remains underexplored.

MedCLIP-SAM is an innovative framework that combines CLIP and SAM models to facilitate medical image segmentation using text prompts, operational in both zero-shot and weakly supervised paradigms. This framework introduces the Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss, enhancing the fine-tuning of BiomedCLIP. Additionally, it leverages gScoreCAM to formulate prompts, thereby generating segmentation masks in a zero-shot context, which can be refined in weak supervision. Figure 1

Figure 1: An overview of the proposed MedCLIP-SAM framework.

Efficient BiomedCLIP Fine-Tuning

Decoupled Hard Negative Noise Contrastive Estimation Loss

The DHN-NCE loss combines InfoNCE with hard negative sampling, optimizing medical image-specific tasks by focusing on closely related samples. The elimination of the positive term in the denominator enhances learning efficiency, accommodating smaller batch sizes, essential for sparse medical datasets.

Fine-Tuning Implementation

Using the MedPix dataset, BiomedCLIP is fine-tuned with the DHN-NCE loss, supported by Vision Transformer and PubMedBERT encoders. Images undergo preprocessing, normalization, and a careful train-validation split, ensuring robust learning and retrieval capacities across medical imaging contexts.

Zero-Shot and Weakly Supervised Segmentation

Zero-Shot Segmentation Strategy

Using BiomedCLIP and gScoreCAM, initial segmentation masks are produced, subsequently refined by SAM to establish pseudo-masks. This approach demonstrates versatility across various modalities, including breast ultrasound, brain MRI, and lung X-ray datasets.

Weakly Supervised Refinement

Weakly supervised segmentation exploits pseudo-masks to train Residual UNet models, aiming to enhance segmentation quality further. The effectiveness varies across imaging modalities, improving results notably within X-ray lung segmentation contexts.

Experimental Results

Through comprehensive validation, MedCLIP-SAM exhibits notable accuracy across retrieval and segmentation tasks. Ablation studies confirm gScoreCAM's superiority over GradCAM in generating bounding-box prompts for SAM. Fine-tuned BiomedCLIP models consistently outperform baselines across metrics such as IoU, DSC, and AUC.

Discussion

MedCLIP-SAM emerges as a versatile framework for universal radiological segmentation, leveraging foundation models to facilitate interactive, text-prompt-based anatomical identification. Its innovative DHN-NCE loss contributes substantially to model efficiency and performance. Future directions include exploring MedSAM integration and expanding test scenarios to encompass diverse medical imaging applications.

Conclusion

MedCLIP-SAM integrates state-of-the-art foundation models to offer a novel approach to medical image segmentation, demonstrating strong performance and adaptability across various tasks and modalities. Its contributions pave the way for enhanced clinical applications, fostering interactive AI assistance in medical imaging.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 6 likes.

Upgrade to Pro to view all of the tweets about this paper: