Global and Local Contrastive Self-Supervised Learning for Semantic Segmentation of HR Remote Sensing Images (2106.10605v2)

Published 20 Jun 2021 in cs.CV

Abstract: Supervised learning for semantic segmentation requires a large number of labeled samples, which is difficult to obtain in the field of remote sensing. Self-supervised learning (SSL), can be used to solve such problems by pre-training a general model with a large number of unlabeled images and then fine-tuning it on a downstream task with very few labeled samples. Contrastive learning is a typical method of SSL that can learn general invariant features. However, most existing contrastive learning methods are designed for classification tasks to obtain an image-level representation, which may be suboptimal for semantic segmentation tasks requiring pixel-level discrimination. Therefore, we propose a global style and local matching contrastive learning network (GLCNet) for remote sensing image semantic segmentation. Specifically, 1) the global style contrastive learning module is used to better learn an image-level representation, as we consider that style features can better represent the overall image features. 2) The local features matching contrastive learning module is designed to learn representations of local regions, which is beneficial for semantic segmentation. The experimental results show that our method mostly outperforms SOTA self-supervised methods and the ImageNet pre-training method. Specifically, with 1\% annotation from the original dataset, our approach improves Kappa by 6\% on the ISPRS Potsdam dataset relative to the existing baseline. Moreover, our method outperforms supervised learning methods when there are some differences between the datasets of upstream tasks and downstream tasks. Since SSL could directly learn the essential characteristics of data from unlabeled data, which is easy to obtain in the remote sensing field, this may be of great significance for tasks such as global mapping. The source code is available at https://github.com/GeoX-Lab/G-RSIM.

Citations (102)

View on Semantic Scholar

Summary

The paper introduces GLCNet, a novel contrastive self-supervised approach that leverages both global style and local pixel-level features for semantic segmentation.
The methodology integrates global style contrastive learning with local matching features to achieve superior performance on datasets like ISPRS Potsdam and DGLC.
The findings demonstrate that self-supervised learning can significantly reduce the need for extensive labeled data in high-resolution remote sensing applications.

Overview of Contrastive Self-Supervised Learning for Semantic Segmentation in Remote Sensing

The paper "Global and Local Contrastive Self-Supervised Learning for Semantic Segmentation of HR Remote Sensing Images" outlines a novel approach to addressing the challenges faced in semantic segmentation of high-resolution (HR) remote sensing images, particularly in reducing the dependency on labeled datasets. The research centers around improving self-supervised learning (SSL) methodologies to advance the efficacy of semantic segmentation tasks. This paper introduces the Global and Local Contrastive Learning Network (GLCNet) designed to optimize learning efficacy from both global and local perspectives in remote sensing.

Problem Context and Contributions

Remote sensing imagery is an invaluable source for numerous applications, such as urban planning and agricultural management. The high-resolution nature of these images necessitates complex, pixel-level semantic segmentation for accurate information extraction. Traditional approaches, reliant on supervised learning, demand a substantial amount of labeled data, which is both costly and labor-intensive to produce, especially at the pixel level and across diverse geographic contexts.

The paper makes several key contributions:

Introduction of Self-Supervised Learning (SSL) to Remote Sensing: It applies SSL specifically to RSI semantic segmentation tasks, allowing models to learn from abundant unlabeled data.
Development of a Novel Framework, GLCNet: GLCNet balances global style features with local matching features, improving the model's ability to interpret and segment remote sensing images accurately.
Empirical Validation Across Multiple Datasets: The method is tested against existing self-supervised and supervised techniques, showing improvements in Kappa statistics with minimal labeled data available for fine-tuning.

Methodological Insights

Contrastive Learning Principles: The approach builds on contrastive self-supervised learning principles, allowing models to differentiate between matching and distinct instances based on feature augmentation. The method comprises two principal components:

Global Style Contrastive Learning: Utilizes style features, such as channel-wise mean and variance, instead of average pooling features to better capture global image patterns.
Local Matching Contrastive Learning: Intentionally focuses on learning representations of local pixel-level features, deemed critical for nuanced semantic segmentation tasks.

Key Results

The GLCNet approach demonstrates superior performance across several datasets, including the ISPRS Potsdam, DGLC, Hubei, and Xiangtan datasets. Notably, with only 1% of the labeled dataset utilized for training, the approach surpasses previous state-of-the-art self-supervised methods and exhibits competitive results against supervised ImageNet pre-trained models. The improvements are particularly significant when the pre-training and fine-tuning datasets exhibit variance, underscoring the method's robustness.

Implications and Future Directions

The paper's findings suggest significant implications for future remote sensing applications. By reducing dependency on large labeled datasets, GLCNet offers a scalable solution for global mapping and environmental monitoring tasks, where accessibility to labeled data across diverse territories is limited.

Moreover, future research might address incorporating true temporal-spatial data to simulate real-world conditions better, enhancing the model's temporal invariant feature learning beyond artificial data augmentation techniques used in the current paper. Additionally, leveraging advancements in adversarial learning could further bolster model robustness against variable imaging conditions.

In conclusion, the paper illustrates a meaningful advance in contrastive self-supervised learning applications within remote sensing, offering a practical pathway to unlocking the potential of satellite imagery interpretation through less labor-intensive means. The proposed GLCNet framework exemplifies a significant stride toward efficient, scalable semantic segmentation models suitable for diverse, global datasets.

PDF Markdown

Related Papers

GitHub

GitHub - GeoX-Lab/G-RSIM (60 stars)