Land cover mapping at very high resolution with rotation equivariant CNNs: towards small yet accurate models (1803.06253v1)

Published 16 Mar 2018 in cs.CV

Abstract: In remote sensing images, the absolute orientation of objects is arbitrary. Depending on an object's orientation and on a sensor's flight path, objects of the same semantic class can be observed in different orientations in the same image. Equivariance to rotation, in this context understood as responding with a rotated semantic label map when subject to a rotation of the input image, is therefore a very desirable feature, in particular for high capacity models, such as Convolutional Neural Networks (CNNs). If rotation equivariance is encoded in the network, the model is confronted with a simpler task and does not need to learn specific (and redundant) weights to address rotated versions of the same object class. In this work we propose a CNN architecture called Rotation Equivariant Vector Field Network (RotEqNet) to encode rotation equivariance in the network itself. By using rotating convolutions as building blocks and passing only the the values corresponding to the maximally activating orientation throughout the network in the form of orientation encoding vector fields, RotEqNet treats rotated versions of the same object with the same filter bank and therefore achieves state-of-the-art performances even when using very small architectures trained from scratch. We test RotEqNet in two challenging sub-decimeter resolution semantic labeling problems, and show that we can perform better than a standard CNN while requiring one order of magnitude less parameters.

Citations (225)

View on Semantic Scholar

Summary

The paper’s main contribution is RotEqNet, a novel CNN that encodes rotation equivariance to accurately classify sub-decimeter imagery.
It employs rotating convolutional filters and vector activations to reduce parameters while maintaining state-of-the-art performance.
Empirical evaluations on ISPRS Vaihingen and Zeebruges datasets confirm its high accuracy in detecting key features like vehicles and building boundaries.

Overview of Land Cover Mapping with Rotation Equivariant CNNs

The paper "Land cover mapping at very high resolution with rotation equivariant CNNs: towards small yet accurate models" introduces a novel approach to accurately classify pixels in sub-decimeter resolution imagery using a rotation equivariant convolutional neural network (CNN) architecture, termed the Rotation Equivariant Vector Field Network (RotEqNet). This work addresses challenges inherent in the variability of object orientation within high-resolution remote sensing images by encoding rotation equivariance directly within the CNN architecture.

The essential innovation of RotEqNet is its ability to handle arbitrary object orientations effectively, a common characteristic in overhead imagery. This is achieved by employing rotating convolutions as fundamental layers and strategically maximizing the orientation-specific activations across the network. Thus, the model processes different orientations of the same object using a consistent filter bank, significantly reducing the number of parameters compared to conventional CNNs while maintaining state-of-the-art performance.

Methodological Contribution

The RotEqNet differs from traditional CNNs by implementing rotating convolutional filters – filters that are systematically rotated at multiple angles during training. Each filter's output is a tensor comprised of rotations, from which the network extracts the maximum response and associated orientation, resulting in a compact and efficient vector field representation per filter. This model directly encodes rotation equivariance, avoiding the need for extensive data augmentation traditionally used to achieve rotation invariance in CNNs.

Furthermore, RotEqNet is distinguished by its compact model architecture. Despite its reduced parameter count, the network is designed to maintain high expressiveness by utilizing vector activations, which store both magnitude and direction of the maximum activation for each pixel, inherently providing a richer representation over scalar activations of conventional CNNs.

Empirical Evaluation

The effectiveness of RotEqNet is demonstrated through empirical evaluations on two high-resolution remote sensing datasets: the ISPRS Vaihingen and the Zeebruges benchmarks. Across these datasets, RotEqNet consistently achieves high classification accuracy, excelling particularly in the detection of classes defined by distinct boundary features, such as vehicles and buildings. The results indicate competitive, if not superior, performance compared to significantly larger standard CNNs, illustrating the efficacy of encoding rotational invariance directly into the network.

Notably, RotEqNet necessitated roughly one order of magnitude fewer parameters than a traditional CNN to achieve comparable performance levels, underscoring its practical advantages in scenarios with limited labeled data and computational resources.

Conclusion and Future Directions

This research contributes meaningfully to the field of remote sensing and computer vision by presenting a robust framework for mapping land cover with inherent rotation invariance, which could be pivotal for applications involving object detection and classification in aerial imagery. By reducing the amount of labeled ground truth data required and optimizing model architecture for efficiency, RotEqNet presents a compelling alternative to conventional deep learning approaches in contexts necessitating resilience to orientation variance.

Future work could explore further optimization of the RotEqNet for different tasks such as three-dimensional object detection, real-time processing, and integration with multi-modal data sources. Additionally, examining the potential combination of rotation equivariance with other forms of geometric transformations could yield even more comprehensive model efficiencies and enhanced feature learning. The theoretical framework of RotEqNet thus invites exploration into diverse applications beyond semantic labeling, extending to any domain necessitating enhanced object orientation handling within convolutional architectures.

PDF Markdown