Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 64 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 457 tok/s Pro
Claude Sonnet 4 32 tok/s Pro
2000 character limit reached

Zero-Shot Refinement of Buildings' Segmentation Models using SAM (2310.01845v2)

Published 3 Oct 2023 in cs.CV, cs.AI, cs.CL, and cs.LG

Abstract: Foundation models have excelled in various tasks but are often evaluated on general benchmarks. The adaptation of these models for specific domains, such as remote sensing imagery, remains an underexplored area. In remote sensing, precise building instance segmentation is vital for applications like urban planning. While Convolutional Neural Networks (CNNs) perform well, their generalization can be limited. For this aim, we present a novel approach to adapt foundation models to address existing models' generalization dropback. Among several models, our focus centers on the Segment Anything Model (SAM), a potent foundation model renowned for its prowess in class-agnostic image segmentation capabilities. We start by identifying the limitations of SAM, revealing its suboptimal performance when applied to remote sensing imagery. Moreover, SAM does not offer recognition abilities and thus fails to classify and tag localized objects. To address these limitations, we introduce different prompting strategies, including integrating a pre-trained CNN as a prompt generator. This novel approach augments SAM with recognition abilities, a first of its kind. We evaluated our method on three remote sensing datasets, including the WHU Buildings dataset, the Massachusetts Buildings dataset, and the AICrowd Mapping Challenge. For out-of-distribution performance on the WHU dataset, we achieve a 5.47\% increase in IoU and a 4.81\% improvement in F1-score. For in-distribution performance on the WHU dataset, we observe a 2.72\% and 1.58\% increase in True-Positive-IoU and True-Positive-F1 score, respectively. Our code is publicly available at this Repo (https://github.com/geoaigroup/GEOAI-ECRS2023), hoping to inspire further exploration of foundation models for domain-specific tasks within the remote sensing community.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. An Introduction to Convolutional Neural Networks, 2015, [arXiv:cs.NE/1511.08458].
  2. Foundational Models Defining a New Era in Vision: A Survey and Outlook, 2023, [arXiv:cs.CV/2307.13721].
  3. Vision transformers for dense prediction: A survey. Knowledge-Based Systems 2022, 253, 109552. https://doi.org/https://doi.org/10.1016/j.knosys.2022.109552.
  4. Segment Anything, 2023, [arXiv:cs.CV/2304.02643].
  5. OpenAI. GPT-4 Technical Report, 2023, [arXiv:cs.CL/2303.08774].
  6. Llama 2: Open Foundation and Fine-Tuned Chat Models, 2023, [arXiv:cs.CL/2307.09288].
  7. Unified Model for Image, Video, Audio and Language Tasks. arXiv preprint arXiv:2307.16184 2023.
  8. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection, 2023, [arXiv:cs.CV/2303.05499].
  9. Deep Residual Learning for Image Recognition, 2015, [arXiv:cs.CV/1512.03385].
  10. SegGPT: Segmenting Everything In Context, 2023, [arXiv:cs.CV/2304.03284].
  11. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, 2021, [arXiv:cs.CV/2103.14030].
  12. eP-ALM: Efficient Perceptual Augmentation of Language Models. arXiv preprint arXiv:2303.11403 2023.
  13. Text-guided Foundation Model Adaptation for Pathological Image Classification, 2023, [arXiv:cs.CV/2307.14901].
  14. Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation, 2023, [arXiv:cs.CV/2304.12620].
  15. The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot, 2023, [arXiv:cs.CV/2306.16623].
  16. Adapting Segment Anything Model for Change Detection in HR Remote Sensing Images, 2023, [arXiv:cs.CV/2309.01429].
  17. Personalize Segment Anything Model with One Shot, 2023, [arXiv:cs.CV/2305.03048].
  18. Lebanon Solar Rooftop Potential Assessment Using Buildings Segmentation From Aerial Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2022, 15, 4909–4918.
  19. D-LinkNet: LinkNet With Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018.
  20. Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set. IEEE Transactions on Geoscience and Remote Sensing 2019, 57, 574–586. https://doi.org/10.1109/TGRS.2018.2858817.
  21. Mnih, V. Machine Learning for Aerial Image Labeling. PhD thesis, University of Toronto, 2013.
  22. Deep Learning for Understanding Satellite Imagery: An Experimental Survey. Frontiers in Artificial Intelligence 2020, 3.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube