Pixel-Level Change Detection Pseudo-Label Learning for Remote Sensing Change Captioning (2312.15311v2)
Abstract: The existing methods for Remote Sensing Image Change Captioning (RSICC) perform well in simple scenes but exhibit poorer performance in complex scenes. This limitation is primarily attributed to the model's constrained visual ability to distinguish and locate changes. Acknowledging the inherent correlation between change detection (CD) and RSICC tasks, we believe pixel-level CD is significant for describing the differences between images through language. Regrettably, the current RSICC dataset lacks readily available pixel-level CD labels. To address this deficiency, we leverage a model trained on existing CD datasets to derive CD pseudo-labels. We propose an innovative network with an auxiliary CD branch, supervised by pseudo-labels. Furthermore, a semantic fusion augment (SFA) module is proposed to fuse the feature information extracted by the CD branch, thereby facilitating the nuanced description of changes. Experiments demonstrate that our method achieves state-of-the-art performance and validate that learning pixel-level CD pseudo-labels significantly contributes to change captioning. Our code will be available at: https://github.com/Chen-Yang-Liu/Pix4Cap
- “Semicdnet: A semisupervised convolutional neural network for change detection in high resolution remote-sensing images,” IEEE Transactions on Geoscience and Remote Sensing, pp. 1–16, 2020.
- “Nwpu-captions dataset and mlca-net for remote sensing image captioning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–19, 2022.
- “Remote sensing image captioning based on multi-layer aggregated transformer,” IEEE Geoscience and Remote Sensing Letters, pp. 1–1, 2022.
- “Captioning changes in bi-temporal remote sensing images,” in 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021, pp. 2891–2894.
- “Change captioning: A new paradigm for multitemporal remote sensing image analysis,” IEEE Transactions on Geoscience and Remote Sensing, pp. 1–1, 2022.
- “Remote sensing image change captioning with dual-branch transformers: A new method and a large scale dataset,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–20, 2022.
- “Progressive scale-aware network for remote sensing image change captioning,” in IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium, 2023, pp. 6668–6671.
- “Changes to captions: An attentive network for remote sensing change captioning,” IEEE Trans. Image Process., 2023.
- “A decoupling paradigm with prompt learning for remote sensing image change captioning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–18, 2023.
- “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
- “Robust change captioning,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 4623–4632.
- “Describing and localizing multiple changes with transformers,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 1951–1960.
- “Remote sensing image change detection with transformers,” IEEE Transactions on Geoscience and Remote Sensing, 2021.
- “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 770–778.
- “Global visual feature and linguistic state guided attention for remote sensing image captioning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022.