Papers
Topics
Authors
Recent
2000 character limit reached

nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance (2309.16967v3)

Published 29 Sep 2023 in cs.CV and eess.IV

Abstract: Automatic segmentation of medical images is crucial in modern clinical workflows. The Segment Anything Model (SAM) has emerged as a versatile tool for image segmentation without specific domain training, but it requires human prompts and may have limitations in specific domains. Traditional models like nnUNet perform automatic segmentation during inference and are effective in specific domains but need extensive domain-specific training. To combine the strengths of foundational and domain-specific models, we propose nnSAM, integrating SAM's robust feature extraction with nnUNet's automatic configuration to enhance segmentation accuracy on small datasets. Our nnSAM model optimizes two main approaches: leveraging SAM's feature extraction and nnUNet's domain-specific adaptation, and incorporating a boundary shape supervision loss function based on level set functions and curvature calculations to learn anatomical shape priors from limited data. We evaluated nnSAM on four segmentation tasks: brain white matter, liver, lung, and heart segmentation. Our method outperformed others, achieving the highest DICE score of 82.77% and the lowest ASD of 1.14 mm in brain white matter segmentation with 20 training samples, compared to nnUNet's DICE score of 79.25% and ASD of 1.36 mm. A sample size study highlighted nnSAM's advantage with fewer training samples. Our results demonstrate significant improvements in segmentation performance with nnSAM, showcasing its potential for small-sample learning in medical image segmentation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. A review of deep-learning-based medical image segmentation methods. Sustainability, 13(3):1224, 2021.
  2. Deep learning in medical imaging and radiation therapy. Medical physics, 46(1):e1–e36, 2019.
  3. U-net: Convolutional networks for biomedical image segmentation. arXiv preprint arXiv:1505.04597, 2015.
  4. Gt u-net: A u-net like group transformer network for tooth root segmentation. In Machine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, September 27, 2021, Proceedings 12, pages 386–395. Springer, 2021.
  5. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
  6. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE transactions on medical imaging, 39(6):1856–1867, 2019.
  7. Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537, 2021.
  8. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 17(2):203–211, 2020.
  9. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  10. Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289, 2023.
  11. Segment anything model for medical image analysis: an experimental study. Medical Image Analysis, 89:102918, 2023.
  12. Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  14. Tinyvit: Fast pretraining distillation for small vision transformers. In European Conference on Computer Vision, pages 68–85. Springer, 2022.
  15. Evaluation of algorithms for multi-modality whole heart segmentation: an open-access grand challenge. Medical image analysis, 58:101537, 2019.
  16. Cf distance: a new domain discrepancy metric and application to explicit domain adaptation for cross-modality cardiac image segmentation. IEEE Transactions on Medical Imaging, 39(12):4274–4285, 2020.
  17. Autosam: Adapting sam to medical images by overloading the prompt encoder. arXiv preprint arXiv:2306.06370, 2023.
  18. Agmb-transformer: Anatomy-guided multi-branch transformer network for automated evaluation of root canal therapy. IEEE Journal of Biomedical and Health Informatics, 26(4):1684–1695, 2021.
Citations (14)

Summary

  • The paper proposes nnSAM, which integrates a pre-trained SAM encoder with nnUNet to enhance segmentation performance on limited training samples.
  • The methodology employs dual encoders and a dual-headed decoder, combining DICE, cross-entropy, and level set losses to capture anatomical shape priors.
  • Experimental results show significantly higher DICE scores and precision across brain, heart, liver, and chest imaging tasks, confirming its effectiveness.

nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance

This essay provides a detailed summary on the integration of the Segment Anything Model (SAM) and nnUNet for medical image segmentation. The proposed nnSAM model addresses challenges in small-sample learning by harnessing the generalization capabilities of SAM with the domain-specific tuning of nnUNet.

Introduction

The task of segmenting medical images is of critical importance across various clinical applications, including diagnosis, treatment planning, and monitoring. Modern deep learning models such as nnUNet streamline these processes; however, they often require extensive domain-specific datasets and training. By leveraging the Segment Anything Model (SAM), nnSAM seeks to reduce these requirements while maintaining accuracy and efficacy.

SAM has demonstrated robust capabilities across diverse image segmentation tasks without necessitating specific domain training. However, in medical domains, its performance declines without tailored adaptation. On the other hand, nnUNet excels in medical image segmentation but at the cost of labor-intensive training processes. nnSAM integrates these models to enhance segmentation performance on small datasets, introducing a plug-and-play SAM encoder into the nnUNet pipeline. Figure 1

Figure 1: The architecture of nnSAM. nnSAM integrates nnUNet's encoder with the pre-trained SAM encoder. The correspondingly concatenated embeddings are input into nnUNet's decoder, which has two output layers: a segmentation header, and a level set-based regression header.

Methodology

Architecture Overview

nnSAM combines nnUNet's auto-configurable framework with SAM's feature extraction potential. The nnSAM architecture consists of dual encoders: nnUNet's encoder and the pre-trained SAM encoder integrated as a Vision Transformer (ViT). Outputs from both encoders are concatenated and processed through a dual-headed decoder. This architecture balances the robust general feature extraction of SAM with the specific adaptability of nnUNet.

Training Process and Loss Functions

During the training process, nnSAM employs a multi-head decoder comprising two distinct paths: a segmentation head and a regression head. The segmentation head is optimized using a combined DICE and cross-entropy loss. Concurrently, a level set function aids the regression head in capturing curvature and shape priors using MSE loss and additional curvature-based loss. This dual-headed approach ensures that nnSAM learns anatomical shape priors and finely tuned segmentation outputs rather than relying solely on pixel-based classification.

Experimental Evaluation

The performance of nnSAM was evaluated across four medical imaging tasks: brain white matter segmentation in MR, heart substructure segmentation in CT, liver segmentation in CT, and chest X-ray segmentation. Metrics such as DICE coefficient and Average Symmetric Surface Distance (ASD) were used to gauge performance. Throughout the evaluations, nnSAM displayed superior accuracy, particularly in scenarios with limited training samples.

MR White Matter Segmentation

In MR white matter segmentation, nnSAM achieved the highest DICE score among competitive architectures, particularly excelling when trained with only a few samples (e.g., 20 annotated images). Figure 2

Figure 2: Segmentation visualization results for different methods on MR brain white matter segmentation.

Table 1 displays comparative results, underscoring nnSAM's proficiency in limited sample size environments.

CT Heart Substructure Segmentation

In tasks such as CT heart substructure segmentation, nnSAM outperformed other models by utilizing the SAM encoder's strength in feature extraction and nnUNet's adaptive segmentation. Figure 3

Figure 3: Segmentation visualization results for different methods on CT heart substructure segmentation.

CT Liver and X-Ray Chest Segmentation

nnSAM demonstrated substantial capability in maintaining segmentation quality in CT liver and chest X-ray tasks (Figures 4 and 5). The integration of a regression head enabled the model to handle anatomical shape irregularities, evident by its precision in edge cases where other models failed.

Discussion and Conclusion

nnSAM presents a significant advancement in medical image segmentation by effectively integrating SAM and nnUNet frameworks. The model maintains high performance with minimal labeled training data, suggesting its potential as a benchmark for medical image segmentation tasks under data constraints. Future work could improve nnSAM by addressing non-standardized structures such as tumors and exploring volume-based 3D segmentation applications. The success of nnSAM indicates the operational value of fusion architectures in the continued development of adaptive and robust medical imaging solutions.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.