ClassWise-SAM-Adapter: Parameter Efficient Fine-tuning Adapts Segment Anything to SAR Domain for Semantic Segmentation (2401.02326v1)
Abstract: In the realm of artificial intelligence, the emergence of foundation models, backed by high computing capabilities and extensive data, has been revolutionary. Segment Anything Model (SAM), built on the Vision Transformer (ViT) model with millions of parameters and vast training dataset SA-1B, excels in various segmentation scenarios relying on its significance of semantic information and generalization ability. Such achievement of visual foundation model stimulates continuous researches on specific downstream tasks in computer vision. The ClassWise-SAM-Adapter (CWSAM) is designed to adapt the high-performing SAM for landcover classification on space-borne Synthetic Aperture Radar (SAR) images. The proposed CWSAM freezes most of SAM's parameters and incorporates lightweight adapters for parameter efficient fine-tuning, and a classwise mask decoder is designed to achieve semantic segmentation task. This adapt-tuning method allows for efficient landcover classification of SAR images, balancing the accuracy with computational demand. In addition, the task specific input module injects low frequency information of SAR images by MLP-based layers to improve the model performance. Compared to conventional state-of-the-art semantic segmentation algorithms by extensive experiments, CWSAM showcases enhanced performance with fewer computing resources, highlighting the potential of leveraging foundational models like SAM for specific downstream tasks in the SAR domain. The source code is available at: https://github.com/xypu98/CWSAM.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W. Lo, P. Dollár, and R. B. Girshick, “Segment anything,” CoRR, vol. abs/2304.02643, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.02643
- F. Xu, H. Wang, and Y. Jin, “Deep learning as applied in sar target recognition and terrain classification,” Journal of Radars, vol. 6, no. 2095-283X(2017)02-0136-13, p. 136, 2017. [Online]. Available: https://radars.ac.cn/en/article/doi/10.12000/JR16130
- X. X. Zhu, S. Montazeri, M. Ali, Y. Hua, Y. Wang, L. Mou, Y. Shi, F. Xu, and R. Bamler, “Deep learning meets sar: Concepts, models, pitfalls, and perspectives,” IEEE Geoscience and Remote Sensing Magazine, vol. 9, no. 4, pp. 143–172, 2021.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” CoRR, vol. abs/2010.11929, 2020. [Online]. Available: https://arxiv.org/abs/2010.11929
- C. Zhang, L. Liu, Y. Cui, G. Huang, W. Lin, Y. Yang, and Y. Hu, “A comprehensive survey on segment anything model for vision and beyond,” 2023.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. B. Girshick, “Masked autoencoders are scalable vision learners,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 2022, pp. 15 979–15 988. [Online]. Available: https://doi.org/10.1109/CVPR52688.2022.01553
- M. Tancik, P. P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. T. Barron, and R. Ng, “Fourier features let networks learn high frequency functions in low dimensional domains,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., 2020. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/55053683268957697aa39fba6f231c68-Abstract.html
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR, 2021, pp. 8748–8763. [Online]. Available: http://proceedings.mlr.press/v139/radford21a.html
- J. Ma, Y. He, F. Li, L. Han, C. You, and B. Wang, “Segment anything in medical images,” 2023.
- J. Wu, R. Fu, H. Fang, Y. Liu, Z. Wang, Y. Xu, Y. Jin, and T. Arbel, “Medical SAM adapter: Adapting segment anything model for medical image segmentation,” CoRR, vol. abs/2304.12620, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.12620
- M. A. Mazurowski, H. Dong, H. Gu, J. Yang, N. Konz, and Y. Zhang, “Segment anything model for medical image analysis: An experimental study,” Medical Image Anal., vol. 89, p. 102918, 2023. [Online]. Available: https://doi.org/10.1016/j.media.2023.102918
- L. Tang, H. Xiao, and B. Li, “Can sam segment anything? when sam meets camouflaged object detection,” 2023.
- T. Chen, L. Zhu, C. Ding, R. Cao, Y. Wang, Z. Li, L. Sun, P. Mao, and Y. Zang, “SAM fails to segment anything? - sam-adapter: Adapting SAM in underperformed scenes: Camouflage, shadow, and more,” CoRR, vol. abs/2304.09148, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.09148
- M. Ahmadi, A. G. Lonbar, A. Sharifi, A. T. Beris, M. Nouri, and A. S. Javidi, “Application of segment anything model for civil infrastructure defect assessment,” CoRR, vol. abs/2304.12600, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.12600
- D. Han, C. Zhang, Y. Qiao, M. Qamar, Y. Jung, S. Lee, S.-H. Bae, and C. S. Hong, “Segment anything model (sam) meets glass: Mirror and transparent objects cannot be easily detected,” 2023.
- S. Ren, F. Luzi, S. Lahrichi, K. Kassaw, L. M. Collins, K. Bradbury, and J. M. Malof, “Segment anything, from space?” 2023.
- S. Julka and M. Granitzer, “Knowledge distillation with segment anything (SAM) model for planetary geological mapping,” CoRR, vol. abs/2305.07586, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2305.07586
- D. Wang, J. Zhang, B. Du, D. Tao, and L. Zhang, “Scaling-up remote sensing segmentation dataset with segment anything model,” CoRR, vol. abs/2305.02034, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2305.02034
- J. Zhang, Z. Zhou, G. Mai, L. Mu, M. Hu, and S. Li, “Text2seg: Remote sensing image semantic segmentation via text-guided visual foundation models,” CoRR, vol. abs/2304.10597, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.10597
- K. Chen, C. Liu, H. Chen, H. Zhang, W. Li, Z. Zou, and Z. Shi, “Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model,” CoRR, vol. abs/2306.16269, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.16269
- S. Chen, C. Ge, Z. Tong, J. Wang, Y. Song, J. Wang, and P. Luo, “Adaptformer: Adapting vision transformers for scalable visual recognition,” in NeurIPS, 2022. [Online]. Available: http://papers.nips.cc/paper_files/paper/2022/hash/69e2f49ab0837b71b0e0cb7c555990f8-Abstract-Conference.html
- X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, C. Zong, F. Xia, W. Li, and R. Navigli, Eds. Association for Computational Linguistics, 2021, pp. 4582–4597. [Online]. Available: https://doi.org/10.18653/v1/2021.acl-long.353
- B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, M. Moens, X. Huang, L. Specia, and S. W. Yih, Eds. Association for Computational Linguistics, 2021, pp. 3045–3059. [Online]. Available: https://doi.org/10.18653/v1/2021.emnlp-main.243
- J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE Computer Society, 2015, pp. 3431–3440. [Online]. Available: https://doi.org/10.1109/CVPR.2015.7298965
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015 - 18th International Conference Munich, Germany, October 5 - 9, 2015, Proceedings, Part III, ser. Lecture Notes in Computer Science, N. Navab, J. Hornegger, W. M. W. III, and A. F. Frangi, Eds., vol. 9351. Springer, 2015, pp. 234–241. [Online]. Available: https://doi.org/10.1007/978-3-319-24574-4_28
- C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Bisenet: Bilateral segmentation network for real-time semantic segmentation,” in Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIII, ser. Lecture Notes in Computer Science, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., vol. 11217. Springer, 2018, pp. 334–349. [Online]. Available: https://doi.org/10.1007/978-3-030-01261-8_20
- F. Zhang, Y. Chen, Z. Li, Z. Hong, J. Liu, F. Ma, J. Han, and E. Ding, “Acfnet: Attentional class feature network for semantic segmentation,” in 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 2019, pp. 6797–6806. [Online]. Available: https://doi.org/10.1109/ICCV.2019.00690
- J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 2019, pp. 3146–3154. [Online]. Available: http://openaccess.thecvf.com/content_CVPR_2019/html/Fu_Dual_Attention_Network_for_Scene_Segmentation_CVPR_2019_paper.html
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 2017, pp. 6230–6239. [Online]. Available: https://doi.org/10.1109/CVPR.2017.660
- M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang, “Denseaspp for semantic segmentation in street scenes,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 2018, pp. 3684–3692. [Online]. Available: http://openaccess.thecvf.com/content_cvpr_2018/html/Yang_DenseASPP_for_Semantic_CVPR_2018_paper.html
- L. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” CoRR, vol. abs/1706.05587, 2017. [Online]. Available: http://arxiv.org/abs/1706.05587
- L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part VII, ser. Lecture Notes in Computer Science, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., vol. 11211. Springer, 2018, pp. 833–851. [Online]. Available: https://doi.org/10.1007/978-3-030-01234-2_49
- L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1412.7062
- ——, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2018. [Online]. Available: https://doi.org/10.1109/TPAMI.2017.2699184
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Álvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021, pp. 12 077–12 090. [Online]. Available: https://proceedings.neurips.cc/paper/2021/hash/64f1f27bf1b4ec22924fd0acb550c235-Abstract.html
- R. Strudel, R. G. Pinel, I. Laptev, and C. Schmid, “Segmenter: Transformer for semantic segmentation,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2021, pp. 7242–7252. [Online]. Available: https://doi.org/10.1109/ICCV48922.2021.00717
- Y. Li, W. Chen, X. Huang, Z. Gao, S. Li, T. He, and Y. Zhang, “Mfvnet: a deep adaptive fusion network with multiple field-of-views for remote sensing image semantic segmentation,” Sci. China Inf. Sci., vol. 66, no. 4, 2023. [Online]. Available: https://doi.org/10.1007/s11432-022-3599-y
- F. Xu and Y.-Q. Jin, “Deorientation theory of polarimetric scattering targets and application to terrain surface classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 43, no. 10, pp. 2351–2364, 2005.
- F. Mohammadimanesh, B. Salehi, M. Mahdianpari, E. Gill, and M. Molinier, “A new fully convolutional neural network for semantic segmentation of polarimetric sar imagery in complex land cover ecosystem,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 151, pp. 223–236, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S092427161930084X
- W. Wu, H. Li, X. Li, H. Guo, and L. Zhang, “Polsar image semantic segmentation based on deep transfer learning - realizing smooth classification with small training sets,” IEEE Geosci. Remote. Sens. Lett., vol. 16, no. 6, pp. 977–981, 2019. [Online]. Available: https://doi.org/10.1109/LGRS.2018.2886559
- D. Cantorna, C. Dafonte, A. Iglesias, and B. A. Varela, “Oil spill segmentation in SAR images using convolutional neural networks. A comparative analysis with clustering and logistic regression algorithms,” Appl. Soft Comput., vol. 84, 2019. [Online]. Available: https://doi.org/10.1016/j.asoc.2019.105716
- M. Shahzad, M. Maurer, F. Fraundorfer, Y. Wang, and X. X. Zhu, “Buildings detection in VHR SAR images using fully convolution neural networks,” IEEE Trans. Geosci. Remote. Sens., vol. 57, no. 2, pp. 1100–1116, 2019. [Online]. Available: https://doi.org/10.1109/TGRS.2018.2864716
- C. Henry, S. M. Azimi, and N. Merkle, “Road segmentation in SAR satellite images with deep fully convolutional neural networks,” IEEE Geosci. Remote. Sens. Lett., vol. 15, no. 12, pp. 1867–1871, 2018. [Online]. Available: https://doi.org/10.1109/LGRS.2018.2864342
- X. Shi, S. Fu, J. Chen, F. Wang, and F. Xu, “Object-level semantic segmentation on the high-resolution gaofen-3 fusar-map dataset,” IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., vol. 14, pp. 3107–3119, 2021. [Online]. Available: https://doi.org/10.1109/JSTARS.2021.3063797
- N. Zheng, Z. an Yang, X. Shi, R.-Y. Zhou, and F. Wang, “Land cover classification of synthetic aperture radar images based on encoder–decoder network with an attention mechanism,” Journal of Applied Remote Sensing, vol. 16, pp. 014 520 – 014 520, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247270390
- X. Li, G. Zhang, H. Cui, S. Hou, S. Wang, X. Li, Y. Chen, Z. Li, and L. Zhang, “Mcanet: A joint semantic segmentation framework of optical and SAR images for land use classification,” Int. J. Appl. Earth Obs. Geoinformation, vol. 106, p. 102638, 2022. [Online]. Available: https://doi.org/10.1016/j.jag.2021.102638
- W. Kang, Y. Xiang, F. Wang, and H. You, “Cfnet: A cross fusion network for joint land cover classification using optical and SAR images,” IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., vol. 15, pp. 1562–1574, 2022. [Online]. Available: https://doi.org/10.1109/JSTARS.2022.3144587
- X. Liu, C. He, Q. Zhang, and M. Liao, “Statistical convolutional neural network for land-cover classification from SAR images,” IEEE Geosci. Remote. Sens. Lett., vol. 17, no. 9, pp. 1548–1552, 2020. [Online]. Available: https://doi.org/10.1109/LGRS.2019.2949789
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- M. Heideman, D. Johnson, and C. Burrus, “Gauss and the history of the fast fourier transform,” IEEE ASSP Magazine, vol. 1, no. 4, pp. 14–21, 1984.
- A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J. G. Rodríguez, “A review on deep learning techniques applied to semantic segmentation,” CoRR, vol. abs/1704.06857, 2017. [Online]. Available: http://arxiv.org/abs/1704.06857
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 2016, pp. 770–778. [Online]. Available: https://doi.org/10.1109/CVPR.2016.90
- J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang, W. Liu, and B. Xiao, “Deep high-resolution representation learning for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3349–3364, 2021. [Online]. Available: https://doi.org/10.1109/TPAMI.2020.2983686
- Xinyang Pu (4 papers)
- Hecheng Jia (4 papers)
- Linghao Zheng (2 papers)
- Feng Wang (408 papers)
- Feng Xu (180 papers)