Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation (2401.00248v4)

Published 30 Dec 2023 in cs.CV and cs.AI

Abstract: The Segment Anything Model (SAM) represents a significant breakthrough into foundation models for computer vision, providing a large-scale image segmentation model. However, despite SAM's zero-shot performance, its segmentation masks lack fine-grained details, particularly in accurately delineating object boundaries. Therefore, it is both interesting and valuable to explore whether SAM can be improved towards highly accurate object segmentation, which is known as the dichotomous image segmentation (DIS) task. To address this issue, we propose DIS-SAM, which advances SAM towards DIS with extremely accurate details. DIS-SAM is a framework specifically tailored for highly accurate segmentation, maintaining SAM's promptable design. DIS-SAM employs a two-stage approach, integrating SAM with a modified advanced network that was previously designed to handle the prompt-free DIS task. To better train DIS-SAM, we employ a ground truth enrichment strategy by modifying original mask annotations. Despite its simplicity, DIS-SAM significantly advances the SAM, HQ-SAM, and Pi-SAM ~by 8.5%, ~6.9%, and ~3.7% maximum F-measure. Our code at https://github.com/Tennine2077/DIS-SAM

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
  2. L. Ke, M. Ye, M. Danelljan, Y. Liu, Y.-W. Tai, C.-K. Tang, and F. Yu, “Segment anything in high quality,” arXiv preprint arXiv:2306.01567, 2023.
  3. X. Qin, H. Dai, X. Hu, D.-P. Fan, L. Shao, and L. Van Gool, “Highly accurate dichotomous image segmentation,” in European Conference on Computer Vision.   Springer, 2022, pp. 38–56.
  4. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  5. X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand, “U2-net: Going deeper with nested u-structure for salient object detection,” Pattern recognition, vol. 106, p. 107404, 2020.
  6. R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in 2009 IEEE conference on computer vision and pattern recognition.   IEEE, 2009, pp. 1597–1604.
  7. R. Margolin, L. Zelnik-Manor, and A. Tal, “How to evaluate foreground maps?” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 248–255.
  8. F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in 2012 IEEE conference on computer vision and pattern recognition.   IEEE, 2012, pp. 733–740.
  9. D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4548–4557.
  10. D.-P. Fan, C. Gong, Y. Cao, B. Ren, M.-M. Cheng, and A. Borji, “Enhanced-alignment measure for binary foreground map evaluation,” arXiv preprint arXiv:1805.10421, 2018.
  11. D.-P. Fan, G.-P. Ji, X. Qin, and M.-M. Cheng, “Cognitive vision inspired object segmentation metric and loss function,” Scientia Sinica Informationis, vol. 6, no. 6, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.