- The paper introduces DeepMask, a ResNet-based algorithm that significantly improves cloud and cloud shadow detection in optical satellite imagery.
- The methodology uses 15x15 pixel local patches with a dual-module setup, optimizing computational efficiency and flexibility with arbitrary input sizes.
- DeepMask outperforms CFMask with an 8.2% overall accuracy gain and a 24.98% improvement on snow/ice detection, validated on Landsat 8 datasets.
DeepMask: An Algorithm for Cloud Detection in Satellite Images Using ResNet
Introduction
The paper investigates the challenge of accurately detecting cloud and cloud shadow in optical satellite imagery, a frequent impediment in remote sensing. Accurate cloud masking is vital for leveraging remotely sensed data across downstream applications such as land classification. The prevalent method, CFMask, although effective, struggles in specific scenarios like distinguishing clouds from bright surfaces or thin clouds. Hence, this research proposes DeepMask, utilizing a ResNet-based convolutional neural network architecture to improve cloud detection across diverse land types.
Methodology and DeepMask Architecture
DeepMask employs a ResNet-based approach for pixel-level cloud and shadow detection. The network functions as a unified pipeline integrating two main components: a local region extractor and a ResNet backbone for classification. DeepMask operates by inputting local spectral patches (extracted from satellite images) that surround a central pixel, which is then classified. The decision to input local patches of 15x15 pixels instead of the entire scene optimizes computational efficiency, particularly favorable for high-resolution satellite imagery.
Figure 1: DeepMask algorithm is a unified pipeline, consisting of the local region extractor (module A) and the ResNet backbone (module B). A zoomed-in view (module C) of a typical residual block is also given.
The inclusion of the residual learning framework in ResNet allows DeepMask to tackle the vanishing gradient problem, thus deepening the network's architecture without hindering performance. Unlike other deep learning models with specific size constraints, DeepMask accepts inputs of arbitrary dimensions, improving its versatility.
Experiments and Evaluation
Data Preparation
DeepMask uses manually-labeled cloud masks from the Landsat 8 CCA dataset as ground truth. Inputs are derived from atmospherically corrected reflectance bands without dependency on thermal bands, enhancing the model’s portability across different satellite platforms lacking thermal capabilities (Figure 2).
Figure 2: Data preparation flowchart.
Land-Type Specific and General Models
The evaluation reflects on two model types: land-type-specific and general across all land types. The land-type-specific model showed notable superiority in accuracy, particularly over challenging surfaces such as snow and ice, where DeepMask exhibited a 24.98% improvement over CFMask.
Ablation Study
An ablation paper was conducted to assess the impact of individual spectral bands on performance. The findings suggest a drop of 3-4% in accuracy when excluding critical bands such as red and blue. Despite these reductions, DeepMask's robust performance persisted even with reduced spectral input, emphasizing its applicability to technologies with limited band availability.
Quantitative and Qualitative Results
In quantitative evaluations, DeepMask achieved an average accuracy of 93.56%, surpassing CFMask by 8.2%. Metrics such as precision and recall demonstrate DeepMask’s reliability in classification. The general model’s slight decrease in accuracy serves as a benchmark for scalability.
Visual assessments show DeepMask’s robust identification of clouds in complex environments and its proficiency in differentiating between cloud and high-reflective surfaces such as snow or water. Visuals for eight different land cover types underline these results, contrasting DeepMask against CFMask outputs.
Example visualization of raw RGB image (1st row), ground truth labels (2nd row), CFMask results (3rd row), and DeepMask results (last row), for four land types: snow (1st column), water (2nd column), wetland (3rd column), and urban (last column).
Example visualization of raw image (RGB), ground truth labels, CFMask results, and DeepMask results, for four land types: crops, forest, barren, and shrubland.
Discussion
The research acknowledges DeepMask's enhanced detection performance attributed to the deep CNN's capacity to harness spectral, spatial, and geometric information. Residual connections in ResNet contribute significantly to parsing complex patches effectively. The flexibility of DeepMask in handling various land types is demonstrated by its general model’s performance, accommodating inputs devoid of thermal bands, thus enabling broader satellite applications.
The model substantiates its applicability for platforms with restricted spectral bands—facilitating its use in low-cost CubeSat missions. The potential exists for further improvements by incorporating more training data and temporal or thermal data modalities. Future development could streamline computational efficiencies and algorithm refinements.
Conclusion
DeepMask, a ResNet-based algorithm, demonstrates a significant improvement in cloud and shadow detection over traditional threshold methods like CFMask. By incorporating both spectral and visual data cues, DeepMask presents a parsimonious, scalable approach for varied satellite optical imagery applications. Its adaptability to input size and shape, and high flexibility serve to advance the field of satellite image processing efficiently.