MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining (2403.13430v2)
Abstract: Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. In this study, we explore the Multi-Task Pretraining (MTP) paradigm for RS foundation models to address this issue. Using a shared encoder and task-specific decoder architecture, we conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. The pretrained models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection. Extensive experiments across 14 datasets demonstrate the superiority of our models over existing ones of similar size and their competitive performance compared to larger state-of-the-art models, thus validating the effectiveness of MTP.
- Z. Zhu, Y. Zhou, K. C. Seto, E. C. Stokes, C. Deng, S. T. Pickett, and H. Taubenböck, “Understanding an urbanizing planet: Strategic directions for remote sensing,” Remote Sensing of Environment, vol. 228, pp. 164–182, 2019.
- Q. Yuan, H. Shen, T. Li, Z. Li, S. Li, Y. Jiang, H. Xu, W. Tan, Q. Yang, J. Wang, J. Gao, and L. Zhang, “Deep learning in environmental remote sensing: Achievements and challenges,” Remote Sensing of Environment, vol. 241, p. 111716, 2020.
- F. Dell’Acqua and P. Gamba, “Remote sensing and earthquake damage assessment: Experiences, limits, and perspectives,” Proceedings of the IEEE, vol. 100, no. 10, pp. 2876–2890, 2012.
- J. Kang, R. Fernandez-Beltran, P. Duan, S. Liu, and A. J. Plaza, “Deep unsupervised embedding for remotely sensed images based on spatially augmented momentum contrast,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, pp. 2598–2610, Mar. 2021.
- D. Wang, J. Zhang, B. Du, G.-S. Xia, and D. Tao, “An empirical study of remote sensing pretraining,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–20, 2023.
- G. Christie, N. Fendley, J. Wilson, and R. Mukherjee, “Functional map of the world,” in CVPR, pp. 6172–6180, 2018.
- G. Sumbul, M. Charfuelan, B. Demir, and V. Markl, “Bigearthnet: A large-scale benchmark archive for remote sensing image understanding,” in IGARSS, pp. 5901–5904, IEEE, 2019.
- Y. Long, G.-S. Xia, S. Li, W. Yang, M. Y. Yang, X. X. Zhu, L. Zhang, and D. Li, “On creating benchmark dataset for aerial image interpretation: Reviews, guidances and million-aid,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 4205–4230, 2021.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in CVPR, pp. 248–255, 2009.
- Y. Long, G.-S. Xia, L. Zhang, G. Cheng, and D. Li, “Aerial scene parsing: From tile-level scene classification to pixel-wise semantic labeling,” arXiv preprint arXiv:2201.01953, 2022.
- T. Zhang, P. Gao, H. Dong, Y. Zhuang, G. Wang, W. Zhang, and H. Chen, “Consecutive Pre-Training: A knowledge transfer learning strategy with relevant unlabeled data for remote sensing domain,” Remote Sensing, vol. 14, no. 22, 2022.
- C. Tao, J. Qi, G. Zhang, Q. Zhu, W. Lu, and H. Li, “TOV: The original vision model for optical remote sensing image understanding via self-supervised learning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 4916–4930, 2023.
- D. Wang, J. Zhang, B. Du, M. Xu, L. Liu, D. Tao, and L. Zhang, “SAMRS: Scaling-up remote sensing segmentation dataset with segment anything model,” in NeurIPS Track on Datasets and Benchmarks, 2023.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in NAACL, pp. 4171–4186, June 2019.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” NeurIPS, vol. 33, pp. 1877–1901, 2020.
- M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in ICCV, pp. 9650–9660, October 2021.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in ICML, pp. 8748–8763, PMLR, 2021.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in ICML, pp. 1597–1607, PMLR, 2020.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in CVPR, June 2020.
- J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar, et al., “Bootstrap your own latent-a new approach to self-supervised learning,” NeurIPS, vol. 33, pp. 21271–21284, 2020.
- H. Bao, L. Dong, S. Piao, and F. Wei, “BEiT: BERT pre-training of image transformers,” in ICLR, 2022.
- Z. Xie, Z. Zhang, Y. Cao, Y. Lin, J. Bao, Z. Yao, Q. Dai, and H. Hu, “SimMIM: A simple framework for masked image modeling,” in CVPR, pp. 9653–9663, June 2022.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in CVPR, pp. 16000–16009, June 2022.
- P. Akiva, M. Purri, and M. Leotta, “Self-supervised material and texture representation learning for remote sensing tasks,” in CVPR, pp. 8203–8215, June 2022.
- G. Mai, N. Lao, Y. He, J. Song, and S. Ermon, “CSP: Self-supervised contrastive spatial pre-training for geospatial-visual representations,” in ICML, PMLR, 2023.
- V. V. Cepeda, G. K. Nayak, and M. Shah, “GeoCLIP: Clip-inspired alignment between locations and images for effective worldwide geo-localization,” in NeurIPS, 2023.
- K. Ayush, B. Uzkent, C. Meng, K. Tanmay, M. Burke, D. Lobell, and S. Ermon, “Geography-aware self-supervised learning,” in ICCV, pp. 10181–10190, October 2021.
- U. Mall, B. Hariharan, and K. Bala, “Change-aware sampling and contrastive learning for satellite images,” in CVPR, pp. 5261–5270, June 2023.
- O. Mañas, A. Lacoste, X. Giro-i Nieto, D. Vazquez, and P. Rodriguez, “Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data,” in ICCV, pp. 9414–9423, 2021.
- D. Wang, Q. Zhang, Y. Xu, J. Zhang, B. Du, D. Tao, and L. Zhang, “Advancing plain vision transformer toward remote sensing foundation model,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023.
- Y. Cong, S. Khanna, C. Meng, P. Liu, E. Rozi, Y. He, M. Burke, D. Lobell, and S. Ermon, “Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery,” in NeurIPS, vol. 35, pp. 197–211, 2022.
- X. Sun, P. Wang, W. Lu, Z. Zhu, X. Lu, Q. He, J. Li, X. Rong, Z. Yang, H. Chang, Q. He, G. Yang, R. Wang, J. Lu, and K. Fu, “Ringmo: A remote sensing foundation model with masked image modeling,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–22, 2023.
- F. Yao, W. Lu, H. Yang, L. Xu, C. Liu, L. Hu, H. Yu, N. Liu, C. Deng, D. Tang, C. Chen, J. Yu, X. Sun, and K. Fu, “RingMo-Sense: Remote sensing foundation model for spatiotemporal prediction via spatiotemporal evolution disentangling,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–21, 2023.
- D. Hong, B. Zhang, X. Li, Y. Li, C. Li, J. Yao, N. Yokoya, H. Li, X. Jia, A. Plaza, et al., “Spectralgpt: Spectral foundation model,” arXiv preprint arXiv:2311.07113, 2023.
- K. Cha, J. Seo, and T. Lee, “A billion-scale foundation model for remote sensing images,” arXiv preprint arXiv:2304.05215, 2023.
- C. J. Reed, R. Gupta, S. Li, S. Brockman, C. Funk, B. Clipp, K. Keutzer, S. Candido, M. Uyttendaele, and T. Darrell, “Scale-MAE: A scale-aware masked autoencoder for multiscale geospatial representation learning,” in ICCV, pp. 4088–4099, October 2023.
- M. Zhang, Q. Liu, and Y. Wang, “Ctxmim: Context-enhanced masked image modeling for remote sensing image understanding,” arXiv preprint arXiv:2310.00022, 2023.
- D. Muhtar, X. Zhang, P. Xiao, Z. Li, and F. Gu, “CMID: A unified self-supervised learning framework for remote sensing image understanding,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–17, 2023.
- M. Tang, A. Cozma, K. Georgiou, and H. Qi, “Cross-Scale MAE: A tale of multiscale exploitation in remote sensing,” NeurIPS, vol. 36, 2024.
- A. Fuller, K. Millard, and J. Green, “CROMA: Remote sensing representations with contrastive radar-optical masked autoencoders,” NeurIPS, vol. 36, 2024.
- Y. Wang, H. H. Hernández, C. M. Albrecht, and X. X. Zhu, “Feature guided masked autoencoder for self-supervised learning in remote sensing,” arXiv preprint arXiv:2310.18653, 2023.
- X. Guo, J. Lao, B. Dang, Y. Zhang, L. Yu, L. Ru, L. Zhong, Z. Huang, K. Wu, D. Hu, et al., “Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery,” arXiv preprint arXiv:2312.10115, 2023.
- Y. Wang, C. M. Albrecht, N. A. A. Braham, C. Liu, Z. Xiong, and X. X. Zhu, “DeCUR: decoupling common & unique representations for multimodal self-supervision,” arXiv preprint arXiv:2309.05300, 2023.
- Y. Feng, P. Wang, W. Diao, Q. He, H. Hu, H. Bi, X. Sun, and K. Fu, “A self-supervised cross-modal remote sensing foundation model with multi-domain representation and cross-domain fusion,” in IGARSS, pp. 2239–2242, 2023.
- Z. Huang, M. Zhang, Y. Gong, Q. Liu, and Y. Wang, “Generic knowledge boosted pretraining for remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–13, 2024.
- M. Mendieta, B. Han, X. Shi, Y. Zhu, and C. Chen, “Towards geospatial foundation models via continual pretraining,” in ICCV, pp. 16806–16816, October 2023.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollar, and R. Girshick, “Segment anything,” in ICCV, pp. 4015–4026, October 2023.
- X.-Y. Tong, G.-S. Xia, Q. Lu, H. Shen, S. Li, S. You, and L. Zhang, “Land-cover classification with high-resolution remote sensing images using transferable deep models,” Remote Sensing of Environment, vol. 237, p. 111322, 2020.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, pp. 770–778, 2016.
- W. Li, K. Chen, H. Chen, and Z. Shi, “Geographical knowledge-driven representation learning for remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022.
- C. Jun, Y. Ban, and S. Li, “Open access to earth land-cover map,” Nature, vol. 514, no. 7523, pp. 434–434, 2014.
- W. Li, K. Chen, and Z. Shi, “Geographical supervision correction for remote sensing representation learning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–20, 2022.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NeurIPS, vol. 25, 2012.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, May 2015.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in CVPR, pp. 1–9, 2015.
- G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in CVPR, pp. 4700–4708, 2017.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in ICCV, pp. 10012–10022, 2021.
- Q. Zhang, Y. Xu, J. Zhang, and D. Tao, “ViTAEv2: Vision transformer advanced by exploring inductive bias for image recognition and beyond,” International Journal of Computer Vision, pp. 1–22, 2023.
- F. Bastani, P. Wolters, R. Gupta, J. Ferdinando, and A. Kembhavi, “Satlaspretrain: A large-scale dataset for remote sensing image understanding,” in ICCV, pp. 16772–16782, October 2023.
- S. Gururangan, A. Marasović, S. Swayamdipta, K. Lo, I. Beltagy, D. Downey, and N. A. Smith, “Don’t stop pretraining: Adapt language models to domains and tasks,” in ACL, pp. 8342–8360, July 2020.
- L. M. Dery, P. Michel, A. Talwalkar, and G. Neubig, “Should we be pre-training? an argument for end-task aware training as an alternative,” in ICLR, 2022.
- T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun, “Unified perceptual parsing for scene understanding,” in ECCV, pp. 418–434, 2018.
- B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” in CVPR, pp. 1290–1299, June 2022.
- W. Li, H. Chen, and Z. Shi, “Semantic segmentation of remote sensing images with self-supervised multitask representation learning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 6438–6450, 2021.
- L. Yuan, D. Chen, Y.-L. Chen, N. Codella, X. Dai, J. Gao, H. Hu, X. Huang, B. Li, C. Li, et al., “Florence: A new foundation model for computer vision,” arXiv preprint arXiv:2111.11432, 2021.
- C. Wu, J. Liang, L. Ji, F. Yang, Y. Fang, D. Jiang, and N. Duan, “Nüwa: Visual synthesis pre-training for neural visual world creation,” in ECCV, pp. 720–736, Springer, 2022.
- J. Li, D. Li, C. Xiong, and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in ICML, pp. 12888–12900, PMLR, 2022.
- L. H. Li, P. Zhang, H. Zhang, J. Yang, C. Li, Y. Zhong, L. Wang, L. Yuan, L. Zhang, J.-N. Hwang, et al., “Grounded language-image pre-training,” in CVPR, pp. 10965–10975, 2022.
- J. Yu, Z. Wang, V. Vasudevan, L. Yeung, M. Seyedhosseini, and Y. Wu, “CoCa: Contrastive captioners are image-text foundation models,” Transactions on Machine Learning Research, 2022.
- W. Wang, H. Bao, L. Dong, J. Bjorck, Z. Peng, Q. Liu, K. Aggarwal, O. K. Mohammed, S. Singhal, S. Som, and F. Wei, “Image as a foreign language: Beit pretraining for vision and vision-language tasks,” in CVPR, pp. 19175–19186, June 2023.
- Y. Xu, Q. Zhang, J. Zhang, and D. Tao, “ViTAE: Vision transformer advanced by exploring intrinsic inductive bias,” NeurIPS, vol. 34, 2021.
- Q. Zhang, J. Zhang, Y. Xu, and D. Tao, “Vision transformer with quadrangle attention,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
- Y. Hu, J. Yuan, C. Wen, X. Lu, and X. Li, “RSGPT: A remote sensing vision language model and benchmark,” arXiv preprint arXiv:2307.15266, 2023.
- C. Wu, B. Du, and L. Zhang, “Fully convolutional change detection framework with generative adversarial network for unsupervised, weakly supervised and regional supervised change detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 8, pp. 9774–9788, 2023.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 1137–1149, June 2017.
- O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in MICCAI, pp. 234–241, Springer, 2015.
- J. Ding, N. Xue, G.-S. Xia, X. Bai, W. Yang, M. Y. Yang, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang, “Object detection in aerial images: A large-scale benchmark and challenges,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7778–7796, 2022.
- K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical remote sensing images: A survey and a new benchmark,” ISPRS journal of photogrammetry and remote sensing, vol. 159, pp. 296–307, 2020.
- X. Sun, P. Wang, Z. Yan, F. Xu, R. Wang, W. Diao, J. Chen, J. Li, Y. Feng, T. Xu, et al., “FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 184, pp. 116–130, 2022.
- J. Wang, Z. Zheng, X. Lu, and Y. Zhong, “LoveDA: A remote sensing land-cover dataset for domain adaptive semantic segmentation,” in NeurIPS Track on Datasets and Benchmarks, 2021.
- W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li, et al., “Internimage: Exploring large-scale vision foundation models with deformable convolutions,” in CVPR, pp. 14408–14419, 2023.
- Q. Zhang, Y. Xu, J. Zhang, and D. Tao, “VSA: Learning varied-size window attention in vision transformers,” in ECCV, pp. 466–483, Springer, 2022.
- Y. Li, H. Mao, R. Girshick, and K. He, “Exploring plain vision transformer backbones for object detection,” in ECCV, pp. 280–296, Springer, 2022.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” ICLR, 2021.
- J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in CVPR, pp. 764–773, 2017.
- X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” in CVPR, June 2019.
- J. Lei Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv e-prints, p. arXiv:1607.06450, July 2016.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, pp. 5998–6008, 2017.
- D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in ICCV, pp. 2980–2988, 2017.
- X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, “Oriented r-cnn for object detection,” in ICCV, pp. 3520–3529, October 2021.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in ICLR, 2019.
- P. Helber, B. Bischke, A. Dengel, and D. Borth, “Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 7, pp. 2217–2226, 2019.
- G. Cheng, J. Han, and X. Lu, “Remote sensing image scene classification: Benchmark and state of the art,” Proceedings of the IEEE, vol. 105, no. 10, pp. 1865–1883, 2017.
- M. Neumann, A. S. Pinto, X. Zhai, and N. Houlsby, “In-domain representation learning for remote sensing,” arXiv preprint arXiv:1911.06721, 2019.
- E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, “Randaugment: Practical automated data augmentation with a reduced search space,” in CVPRW, June 2020.
- J. Tian, J. Lei, J. Zhang, W. Xie, and Y. Li, “SwiMDiff: Scene-wide matching contrastive learning with diffusion constraint for remote sensing image,” arXiv preprint arXiv:2401.05093, 2024.
- J. Irvin, L. Tao, J. Zhou, Y. Ma, L. Nashold, B. Liu, and A. Y. Ng, “Usat: A unified self-supervised encoder for multi-sensor satellite imagery,” arXiv preprint arXiv:2312.02199, 2023.
- M. Noman, M. Naseer, H. Cholakkal, R. M. Anwar, S. Khan, and F. S. Khan, “Rethinking transformers pre-training for multi-spectral satellite imagery,” arXiv preprint arXiv:2403.05419, 2024.
- D. Lam, R. Kuzma, K. McGee, S. Dooley, M. Laielli, M. Klaric, Y. Bulatov, and B. McCord, “xview: Objects in context in overhead imagery,” arXiv preprint arXiv:1802.07856, 2018.
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 318–327, 2020.
- G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang, “Dota: A large-scale dataset for object detection in aerial images,” in CVPR, June 2018.
- G. Cheng, J. Wang, K. Li, X. Xie, C. Lang, Y. Yao, and J. Han, “Anchor-free oriented proposal generator for object detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2022.
- C. Xu, J. Ding, J. Wang, W. Yang, H. Yu, L. Yu, and G.-S. Xia, “Dynamic coarse-to-fine learning for oriented tiny object detection,” in CVPR, pp. 7318–7328, 2023.
- A.-F. O. Detector, “FCOS: A simple and strong anchor-free object detector,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, 2022.
- S. Zhang, C. Chi, Y. Yao, Z. Lei, and S. Z. Li, “Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection,” in CVPR, June 2020.
- X. Yang, J. Yang, J. Yan, Y. Zhang, T. Zhang, Z. Guo, X. Sun, and K. Fu, “SCRDet: Towards more robust detection for small, cluttered and rotated objects,” in ICCV, pp. 8231–8240, 2019.
- Y. Xu, M. Fu, Q. Wang, Y. Wang, K. Chen, G.-S. Xia, and X. Bai, “Gliding vertex on the horizontal bounding box for multi-oriented object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1452–1459, 2020.
- J. Ding, N. Xue, Y. Long, G.-S. Xia, and Q. Lu, “Learning RoI transformer for oriented object detection in aerial images,” in CVPR, pp. 2844–2853, 2019.
- X. Yang, J. Yan, Z. Feng, and T. He, “R3Det: Refined single-stage detector with feature refinement for rotating object,” AAAI, vol. 35, pp. 3163–3171, May 2021.
- L. Hou, K. Lu, J. Xue, and Y. Li, “Shape-adaptive selection and measurement for oriented object detection,” in AAAI, vol. 36, pp. 923–932, 2022.
- L. Dai, H. Liu, H. Tang, Z. Wu, and P. Song, “AO2-DETR: Arbitrary-oriented object detection transformer,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
- J. Han, J. Ding, J. Li, and G.-S. Xia, “Align deep features for oriented object detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, p. 3062048, Jan. 2022.
- J. Han, J. Ding, N. Xue, and G.-S. Xia, “ReDet: A rotation-equivariant detector for aerial object detection,” in CVPR, pp. 2786–2795, June 2021.
- X. Yang, X. Yang, J. Yang, Q. Ming, W. Wang, Q. Tian, and J. Yan, “Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence,” in NeurIPS, 2021.
- X. Yang, J. Yan, M. Qi, W. Wang, Z. Xiaopeng, and T. Qi, “Rethinking rotated object detection with gaussian wasserstein distance loss,” in ICML, 2021.
- D. Liang, Q. Geng, Z. Wei, D. A. Vorontsov, E. L. Kim, M. Wei, and H. Zhou, “Anchor retouching via model interaction for robust object detection in aerial images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022.
- G. Cheng, Y. Yao, S. Li, K. Li, X. Xie, J. Wang, X. Yao, and J. Han, “Dual-aligned oriented detector,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2022.
- X. Wang, G. Wang, Q. Dang, Y. Liu, X. Hu, and D. Yu, “PP-YOLOE-R: An efficient anchor-free rotated object detector,” arXiv preprint arXiv:2211.02386, 2022.
- S. Xu, X. Wang, W. Lv, Q. Chang, C. Cui, K. Deng, G. Wang, Q. Dang, S. Wei, Y. Du, et al., “PP-YOLOE: An evolved version of yolo,” arXiv preprint arXiv:2203.16250, 2022.
- K. H. Wentong Li, Yijie Chen and J. Zhu, “Oriented reppoints for aerial object detection,” in CVPR, 2022.
- Z. Huang, W. Li, X.-G. Xia, and R. Tao, “A general gaussian heatmap label assignment for arbitrary-oriented object detection,” IEEE Transactions on Image Processing, vol. 31, pp. 1895–1910, 2022.
- J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
- X. Yang, Y. Zhou, G. Zhang, J. Yang, W. Wang, J. Yan, X. Zhang, and Q. Tian, “The KFIou loss for rotated object detection,” in ICLR, 2023.
- C. Lyu, W. Zhang, H. Huang, Y. Zhou, Y. Wang, Y. Liu, S. Zhang, and K. Chen, “RTMDet: An empirical study of designing real-time object detectors,” arXiv preprint arXiv:2212.07784, 2022.
- Z. Dong, Y. Gu, and T. Liu, “Generative convnet foundation model with sparse modeling and low-frequency reconstruction for remote sensing image interpretation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–16, 2024.
- Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in CVPR, pp. 11976–11986, 2022.
- Y. Pu, Y. Wang, Z. Xia, Y. Han, Y. Wang, W. Gan, Z. Wang, S. Song, and G. Huang, “Adaptive rotated convolution for rotated object detection,” in ICCV, pp. 6589–6600, October 2023.
- Y. Li, Q. Hou, Z. Zheng, M.-M. Cheng, J. Yang, and X. Li, “Large selective kernel network for remote sensing object detection,” in ICCV, pp. 16794–16805, October 2023.
- H. Yu, Y. Tian, Q. Ye, and Y. Liu, “Spatial transform decoupling for oriented object detection,” arXiv preprint arXiv:2308.10561, 2023.
- X. Zhang, Y. Tian, L. Xie, W. Huang, Q. Dai, Q. Ye, and Q. Tian, “HiViT: A simpler and more efficient design of hierarchical vision transformer,” in ICLR, 2023.
- A. Van Etten, D. Lindenbaum, and T. M. Bacastow, “Spacenet: A remote sensing dataset and challenge series,” arXiv preprint arXiv:1807.01232, 2018.
- H. Zhao, Y. Zhang, S. Liu, J. Shi, C. C. Loy, D. Lin, and J. Jia, “PSANet: Point-wise spatial attention network for scene parsing,” in ECCV, 2018.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in CVPR, pp. 6230–6239, 2017.
- L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in ECCV, pp. 801–818, 2018.
- Z. Zheng, Y. Zhong, J. Wang, and A. Ma, “Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery,” in CVPR, pp. 4095–4104, 2020.
- A. Ma, J. Wang, Y. Zhong, and Z. Zheng, “FactSeg: Foreground activation-driven small object semantic segmentation in large-scale remote sensing imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022.
- K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang, W. Liu, and J. Wang, “High-resolution representations for labeling pixels and regions,” arXiv preprint arXiv:1904.04514, 2019.
- L. Wang, R. Li, C. Duan, C. Zhang, X. Meng, and S. Fang, “A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022.
- L. Wang, R. Li, C. Zhang, S. Fang, C. Duan, X. Meng, and P. M. Atkinson, “UNetFormer: A unet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 190, pp. 196–214, 2022.
- R. Xu, C. Wang, J. Zhang, S. Xu, W. Meng, and X. Zhang, “RSSFormer: Foreground saliency enhancement for remote sensing land-cover segmentation,” IEEE Transactions on Image Processing, vol. 32, pp. 1052–1064, 2023.
- Y. Chen, P. Fang, J. Yu, X. Zhong, X. Zhang, and T. Li, “Hi-resnet: A high-resolution remote sensing network for semantic segmentation,” arXiv preprint arXiv:2305.12691, 2023.
- K. Yamazaki, T. Hanyu, M. Tran, A. Garcia, A. Tran, R. McCann, H. Liao, C. Rainwater, M. Adkins, A. Molthan, et al., “AerialFormer: Multi-resolution transformer for aerial image segmentation,” arXiv preprint arXiv:2306.06842, 2023.
- R. Caye Daudt, B. Le Saux, and A. Boulch, “Fully convolutional siamese networks for change detection,” in ICIP, pp. 4063–4067, 2018.
- S. Fang, K. Li, J. Shao, and Z. Li, “SNUNet-CD: A densely connected siamese network for change detection of vhr images,” IEEE Geoscience and Remote Sensing Letters, vol. 19, p. 3056416, Jan. 2022.
- H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detection with transformers,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, p. 3095166, Jan. 2022.
- M. Liu, Q. Shi, A. Marinoni, D. He, X. Liu, and L. Zhang, “Super-Resolution-Based change detection network with stacked attention module for images with different resolutions,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, p. 3091758, Jan. 2022.
- Z. Zheng, Y. Wan, Y. Zhang, S. Xiang, D. Peng, and B. Zhang, “CLNet: Cross-layer convolutional neural network for change detection in optical remote sensing imagery,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 175, pp. 247–267, May 2021.
- C. Han, C. Wu, H. Guo, M. Hu, and H. Chen, “Hanet: A hierarchical attention network for change detection with bitemporal very-high-resolution remote sensing images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 3867–3878, 2023.
- Y. Zhang, Y. Zhao, Y. Dong, and B. Du, “Self-supervised pretraining via multimodality images with transformer for change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–11, 2023.
- W. G. C. Bandara and V. M. Patel, “A transformer-based siamese network for change detection,” arXiv preprint arXiv:2201.01293, 2022.
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” in NeurIPS, vol. 34, pp. 12077–12090, 2021.
- J. Zhang, Z. Shao, Q. Ding, X. Huang, Y. Wang, X. Zhou, and D. Li, “AERNet: An attention-guided edge refinement network and a dataset for remote sensing building change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023.
- H. Zhang, M. Lin, G. Yang, and L. Zhang, “ESCNet: An end-to-end superpixel-enhanced change detection network for very-high-resolution remote sensing images,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 1, pp. 28–42, 2023.
- Q. Shi, M. Liu, S. Li, X. Liu, F. Wang, and L. Zhang, “A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022.
- Y. Wen, X. Ma, X. Zhang, and M. Pun, “GCD-DDPM: A generative change detection model based on difference-feature guided ddpm,” arXiv: 2306.03424, 2023.
- J. Wang, Y. Zhong, and L. Zhang, “Change detection based on supervised contrastive learning for high-resolution remote sensing imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023.
- W. G. C. Bandara, N. G. Nair, and V. M. Patel, “Ddpm-cd: Remote sensing change detection using denoising diffusion probabilistic models,” arXiv preprint arXiv:2206.11892, 2022.
- H. Guo, B. Du, C. Wu, C. Han, and L. Zhang, “Deepcl: Deep change feature learning on remote sensing images in the metric space,” arXiv preprint arXiv:2307.12208, 2023.
- M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in ICML, pp. 6105–6114, 2019.
- X. Li, L. Yan, Y. Zhang, and H. Zeng, “ESR-DMNet: Enhanced super-resolution-based dual-path metric change detection network for remote sensing images with different resolutions,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–15, 2024.
- Z. Zheng, A. Ma, L. Zhang, and Y. Zhong, “Change is everywhere: Single-temporal supervised object change detection in remote sensing imagery,” in ICCV, pp. 15173–15182, 2021.
- S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in CVPR, pp. 1492–1500, 2017.
- H. Guo, X. Su, C. Wu, B. Du, and L. Zhang, “Saan: Similarity-aware attention flow network for change detection with vhr remote sensing images,” arXiv preprint arXiv:2308.14570, 2023.
- A. Mohammadian and F. Ghaderi, “Siamixformer: a fully-transformer siamese network with temporal fusion for accurate building detection and change detection in bi-temporal remote sensing images,” International Journal of Remote Sensing, vol. 44, no. 12, pp. 3660–3678, 2023.
- Q. Li, R. Zhong, X. Du, and Y. Du, “Transunetcd: A hybrid transformer network for change detection in optical remote-sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–19, 2022.
- H. Chen, F. Pu, R. Yang, R. Tang, and X. Xu, “RDP-Net: Region detail preserving network for change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–10, 2022.
- J. Liu, W. Xuan, Y. Gan, Y. Zhan, J. Liu, and B. Du, “An end-to-end supervised domain adaptation framework for cross-domain change detection,” Pattern Recognition, vol. 132, p. 108960, 2022.
- K. Li, Z. Li, and S. Fang, “Siamese nestedunet networks for change detection of high resolution satellite image,” in CCRIS, p. 42–48, 2021.
- Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,” IEEE Transactions on Medical Imaging, 2019.
- Z. Zheng, S. Tian, A. Ma, L. Zhang, and Y. Zhong, “Scalable multi-temporal remote sensing change data generation via simulating stochastic change process,” in ICCV, pp. 21818–21827, 2023.
- C. Han, C. Wu, and B. Du, “HCGMNet: A hierarchical change guiding map network for change detection,” in IGARSS, pp. 5511–5514, 2023.
- F. I. Diakogiannis, F. Waldner, and P. Caccetta, “Looking for change? roll the dice and demand attention,” Remote Sensing, vol. 13, no. 18, 2021.
- C. Han, C. Wu, H. Guo, M. Hu, J. Li, and H. Chen, “Change guiding network: Incorporating change prior to guide change detection in remote sensing imagery,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 8395–8407, 2023.
- K. Chen, C. Liu, W. Li, Z. Liu, H. Chen, H. Zhang, Z. Zou, and Z. Shi, “Time travelling pixels: Bitemporal features integration with foundation model for remote sensing image change detection,” 2023.
- S. Fang, K. Li, and Z. Li, “Changer: Feature interaction is what you need for change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–11, 2023.
- H. Zhang, C. Wu, Z. Zhang, Y. Zhu, H. Lin, Z. Zhang, Y. Sun, T. He, J. Mueller, R. Manmatha, M. Li, and A. Smola, “ResNeSt: Split-attention networks,” in CVPRW, pp. 2736–2746, June 2022.
- X. Tang, T. Zhang, J. Ma, X. Zhang, F. Liu, and L. Jiao, “WNet: W-shaped hierarchical network for remote-sensing image change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–14, 2023.
- Z. Xia, X. Pan, S. Song, L. E. Li, and G. Huang, “Vision transformer with deformable attention,” in CVPR, pp. 4794–4803, 2022.
- C. Han, C. Wu, M. Hu, J. Li, and H. Chen, “C2F-SemiCD: A coarse-to-fine semi-supervised change detection method based on consistency regularization in high-resolution remote-sensing images,” IEEE Transactions on Geoscience and Remote Sensing, pp. 1–1, 2024.
- C. Zhao, Y. Tang, S. Feng, Y. Fan, W. Li, R. Tao, and L. Zhang, “High-resolution remote sensing bitemporal image change detection based on feature interaction and multitask learning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–14, 2023.
- S. Zhao, X. Zhang, P. Xiao, and G. He, “Exchanging dual-encoder–decoder: A new strategy for change detection with semantic guidance and spatial localization,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023.
- M. Lin, G. Yang, and H. Zhang, “Transition is a process: Pair-to-video change detection networks for very high resolution remote sensing images,” IEEE Transactions on Image Processing, vol. 32, pp. 57–71, 2023.
- S. Dong, L. Wang, B. Du, and X. Meng, “ChangeCLIP: Remote sensing change detection with multimodal vision-language representation learning,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 208, pp. 53–69, 2024.
- K. Li, X. Cao, and D. Meng, “A new learning paradigm for foundation model-based remote sensing change detection,” arXiv preprint arXiv:2312.01163, 2023.
- R. C. Daudt, B. Le Saux, A. Boulch, and Y. Gousseau, “Urban change detection for multispectral earth observation using convolutional neural networks,” in IGARSS, pp. 2115–2118, 2018.
- S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 1, pp. 574–586, 2019.
- H. Chen and Z. Shi, “A spatial-temporal attention-based method and a new dataset for remote sensing image change detection,” Remote Sensing, vol. 12, no. 10, 2020.
- M. Lebedev, Y. V. Vizilter, O. Vygolov, V. A. Knyaz, and A. Y. Rubis, “Change detection in remote sensing images using conditional adversarial networks,” The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 42, pp. 565–571, 2018.
- Di Wang (408 papers)
- Jing Zhang (732 papers)
- Minqiang Xu (17 papers)
- Lin Liu (190 papers)
- Dongsheng Wang (47 papers)
- Erzhong Gao (1 paper)
- Chengxi Han (12 papers)
- Haonan Guo (11 papers)
- Bo Du (264 papers)
- Dacheng Tao (830 papers)
- Liangpei Zhang (113 papers)