Harmonic Networks: Deep Translation and Rotation Equivariance (1612.04642v2)

Published 14 Dec 2016 in cs.CV, cs.LG, and stat.ML

Abstract: Translating or rotating an input image should not affect the results of many computer vision tasks. Convolutional neural networks (CNNs) are already translation equivariant: input image translations produce proportionate feature map translations. This is not the case for rotations. Global rotation equivariance is typically sought through data augmentation, but patch-wise equivariance is more difficult. We present Harmonic Networks or H-Nets, a CNN exhibiting equivariance to patch-wise translation and 360-rotation. We achieve this by replacing regular CNN filters with circular harmonics, returning a maximal response and orientation for every receptive field patch. H-Nets use a rich, parameter-efficient and low computational complexity representation, and we show that deep feature maps within the network encode complicated rotational invariants. We demonstrate that our layers are general enough to be used in conjunction with the latest architectures and techniques, such as deep supervision and batch normalization. We also achieve state-of-the-art classification on rotated-MNIST, and competitive results on other benchmark challenges.

Citations (675)

View on Semantic Scholar

Summary

The paper demonstrates that integrating steerable circular harmonic filters into CNNs yields continuous rotation and translation equivariance, achieving a 1.69% test error on rotated-MNIST.
The H-Net architecture constrains filters with complex radial representations, ensuring predictable feature transformations at all depths without relying on extensive data augmentation.
Empirical results on rotated-MNIST and BSD500 highlight improved interpretability, sample efficiency, and state-of-the-art performance for non-pretrained vision models.

Harmonic Networks: Deep Translation and Rotation Equivariance

The paper "Harmonic Networks: Deep Translation and Rotation Equivariance" introduces Harmonic Networks (H-Nets), a novel convolutional neural network (CNN) architecture designed to achieve equivariance to both translations and 360-degree rotations for image recognition tasks. The authors aim to address the limitations of traditional CNNs, which are inherently translation equivariant but lack rotation equivariance—a property often sought through data augmentation.

Core Concept and Methodology

H-Nets are based on using circular harmonics as filters in place of conventional CNN filters. These circular harmonics are steerable filters that allow the network to produce maximal responses and orientations for every receptive field patch. The significance lies in their ability to ensure that feature maps transform predictably under image rotations, thus equipping the network with the same degree of rotation-equivariance that CNNs naturally have for translations.

In their methodology, the authors detail how they constrain circular harmonics with a complex radial representation to achieve this. They derive properties such as chained cross-correlation leading to a cumulative rotation order—ensuring that the network's output reflects the input's rotational transformations in a coherent manner. The H-Net's architecture, which incorporates streams of different rotation orders, maintains this equivariance at all network depths without needing separate rotated copies of input images or filters.

Numerical Results and Claims

The authors present significant empirical evidence supporting their claims by evaluating H-Nets on rotated-MNIST and boundary detection on the Berkeley Segmentation Dataset (BSD500). Noteworthy numerical results include setting a new state-of-the-art on the rotated-MNIST dataset, with a test error of 1.69%, and achieving the best results for non-pretrained models on BSD500.

Theoretical and Practical Implications

Theoretically, H-Nets demonstrate that it is viable to hard-bake continuous rotation equivariance into neural network architectures using steerable filters. This architecture constrains the hypothesis space of learnable models, potentially leading to more sample-efficient learning. Practically, H-Nets reduce the need for extensive data augmentation for rotation invariance, which simplifies the architecture and aids interpretability. Feature maps become inherently more intuitive, allowing for better understanding of learned features across different orientations.

Future Directions

Future research could involve extending H-Nets to accommodate other transformations or applications, such as 3D data or more complex transformation groups. Exploring the computational efficiency and scalability of H-Nets in larger-scale applications could further validate their utility.

In conclusion, this work contributes an innovative approach to integrating rotation equivariance in neural networks, fostering more reliable and interpretable machine learning models without reliance on large datasets or augmentation techniques.

PDF Markdown

Related Papers

YouTube

Show All Videos