Emergent Mind

RobustSAM: Segment Anything Robustly on Degraded Images

(2406.09627)
Published Jun 13, 2024 in cs.CV , cs.AI , and eess.IV

Abstract

Segment Anything Model (SAM) has emerged as a transformative approach in image segmentation, acclaimed for its robust zero-shot segmentation capabilities and flexible prompting system. Nonetheless, its performance is challenged by images with degraded quality. Addressing this limitation, we propose the Robust Segment Anything Model (RobustSAM), which enhances SAM's performance on low-quality images while preserving its promptability and zero-shot generalization. Our method leverages the pre-trained SAM model with only marginal parameter increments and computational requirements. The additional parameters of RobustSAM can be optimized within 30 hours on eight GPUs, demonstrating its feasibility and practicality for typical research laboratories. We also introduce the Robust-Seg dataset, a collection of 688K image-mask pairs with different degradations designed to train and evaluate our model optimally. Extensive experiments across various segmentation tasks and datasets confirm RobustSAM's superior performance, especially under zero-shot conditions, underscoring its potential for extensive real-world application. Additionally, our method has been shown to effectively improve the performance of SAM-based downstream tasks such as single image dehazing and deblurring.

RobustSAM's superiority in refining SAM-based single image dehazing and deblurring.

Overview

  • The paper 'RobustSAM: Segment Anything Robustly on Degraded Images' focuses on enhancing the Segment Anything Model (SAM) to improve its performance on low-quality images through new architectural modules and retraining strategies.

  • RobustSAM's introduction of the Anti-Degradation Token Generation Module and the Anti-Degradation Mask Feature Generation Module enables the model to process image features invariant to various degradations, supported by training on a substantial dataset of 688,000 image-mask pairs.

  • Evaluations demonstrated RobustSAM's superior performance over baseline SAM and other models, achieving significant IoU gains in both clear and degraded conditions across multiple datasets, making it practical for real-world applications in fields like autonomous driving and medical imaging.

RobustSAM: Enhancing Image Segmentation in Degraded Conditions

The paper "RobustSAM: Segment Anything Robustly on Degraded Images" proposes an enhancement to the Segment Anything Model (SAM), aimed at improving its performance on images subject to a variety of degradations. This enhancement, termed RobustSAM, is focused on addressing the challenges posed by low-quality images while maintaining the zero-shot generalization capabilities of SAM. The authors present notable methodological developments, detailed results, and insightful analyses underscoring the model's superiority under degradation conditions.

Methodological Innovations

RobustSAM introduces key architectural enhancements over the original SAM to tackle degraded image segmentation. Specifically, it incorporates two novel modules: the Anti-Degradation Token Generation Module and the Anti-Degradation Mask Feature Generation Module. These modules are designed to process image features and output tokens in a way that they become invariant to various forms of image degradation. The robust performance of these modules is achieved by leveraging consistency losses with features from clear images processed by the original SAM.

Moreover, the RobustSAM framework involves retraining focused additional parameters, limiting the computational burden compared to the extensive requirements of SAM. The model can be efficiently trained within 30 hours on eight GPUs, reflecting its practicality for typical research laboratories.

Experimental Approach

The Robust-Seg dataset is another substantial contribution of this paper. Consisting of 688,000 image-mask pairs with various types of synthetic degradations, this dataset serves both as a critical training resource and a comprehensive benchmark for evaluating segmentation models under degraded conditions. The experimental results draw from a wide range of datasets, including seen datasets like MSRA10K and LVIS and unseen zero-shot datasets such as NDD20, STREETS, FSS-1000, and COCO.

Numerical Results and Comparative Analysis

The quantitative evaluation demonstrates that RobustSAM significantly outperforms baseline SAM and existing methodologies like HQ-SAM. RobustSAM achieves IoU gains in the range of 2-5% on seen datasets under both clear and degraded conditions and extends its superiority to zero-shot generalization across unseen datasets.

For instance, in the evaluation on the MSRA10k dataset, the RobustSAM achieved an average IoU of 0.8616, compared to SAM's 0.8207. This improved performance highlights RobustSAM's ability to maintain high accuracy on clear images, while also markedly improving segmentation outcomes in the presence of degradations like blur, low-lighting, and adverse weather conditions.

Practical and Theoretical Implications

From a practical perspective, RobustSAM enhances the robustness of SAM-based applications in real-world conditions. This is particularly relevant for fields such as autonomous driving, medical imaging, and surveillance, where image quality can vary substantially due to environmental factors. The augmented robustness directly translates to improved reliability and accuracy of downstream tasks such as dehazing and deblurring, as evidenced by the enhanced performance metrics in these tasks when using RobustSAM as a prior.

Theoretically, the development of reformation modules like AOTG and AMFG opens new avenues for integrating lightweight, degradation-invariant extensions into existing models. These modules, coupled with consistency loss strategies, demonstrate a structured approach to maintaining and enhancing the original model's capabilities without compromising on its zero-shot learning and generalization strengths.

Future Directions

Future research could explore the integration of other sophisticated restoration techniques within the segmentation model, further enhancing its resilience to extreme degradations. Moreover, extending the Robust-Seg dataset to include more realistic degradation types and leveraging unsupervised learning techniques could provide broader evaluation scenarios and reduce dependency on synthetic data.

In conclusion, RobustSAM represents a significant step forward in the domain of image segmentation under challenging conditions. Its methodological advancements, practical enhancements, and theoretical contributions not only demonstrate high robustness and accuracy but also pave the way for future developments in robust segmentation models.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.