Making Convolutional Networks Shift-Invariant Again (1904.11486v2)

Published 25 Apr 2019 in cs.CV and cs.LG

Abstract: Modern convolutional networks are not shift-invariant, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the sampling theorem. The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks degrades performance; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling and strided-convolution. We observe \textit{increased accuracy} in ImageNet classification, across several commonly-used architectures, such as ResNet, DenseNet, and MobileNet, indicating effective regularization. Furthermore, we observe \textit{better generalization}, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks. Code and anti-aliased versions of popular networks are available at https://richzhang.github.io/antialiased-cnns/ .

Citations (751)

View on Semantic Scholar

Summary

The paper identifies that modern CNNs lose shift invariance due to downsampling operations like strided convolutions and max-pooling.
It introduces a low-pass filtering method before pooling layers to effectively mitigate aliasing and improve translation robustness.
Empirical evaluations on ImageNet and CIFAR-10 show significant accuracy gains and enhanced stability against small shifts.

Making Convolutional Networks Shift-Invariant Again

Overview

The paper "Making Convolutional Networks Shift-Invariant Again" authored by Richard Zhang discusses the critical problem of shift invariance in convolutional neural networks (CNNs). The focus of the paper is on restoring the inherent shift-invariance property in CNNs, which has been compromised due to the prevalent usage of various downsampling operations such as strided convolutions and max-pooling.

Core Contributions

The paper makes the following key contributions:

Problem Identification: The paper begins by identifying the issue that modern CNNs, contrary to initial assumptions, are not shift-invariant. This is a critical issue since translation invariance is one of the fundamental properties expected from a convolutional operation.
Anti-Aliasing Technique: To address the shift-invariance problem, the paper introduces an anti-aliasing technique. By integrating a low-pass filter before downsampling operations, such as max-pooling, the proposed method ensures that the aliasing artifact, which is responsible for the loss of shift invariance, is mitigated.
Theoretical Analysis: The paper provides a comprehensive theoretical analysis explaining why traditional downsampling procedures cause aliasing and degrade shift invariance. It demonstrates the effectiveness of the proposed anti-aliasing mechanism through mathematical formulations and conceptual explanations.
Empirical Evaluation: Empirical experiments reinforce the paper’s hypotheses. Models enhanced with the proposed anti-aliasing filters show consistent improvements across a range of benchmarks, including classification tasks on ImageNet and CIFAR-10. The performance gains are notably significant in terms of robustness to small translations, which are crucial for real-world applications.

Numerical Results

The paper reports several compelling numerical results:

ImageNet Classification: The application of anti-aliasing filters resulted in a top-1 accuracy improvement. When implemented in popular architectures like ResNet, the anti-aliased versions showed a consistent increase in accuracy.
CIFAR-10 Stability: On the CIFAR-10 dataset, the introduction of anti-aliasing filters statistically improved the model's robustness to input shifts, evident from the superior classification performance on translated images.

Implications and Future Work

The practical implications of this research are significant for developers and researchers using CNNs in applications where robustness to small shifts and translations is critical, such as image recognition, scene understanding, and autonomous driving. By restoring shift invariance, models become more reliable and performance-stable across varied input conditions.

Theoretically, this work necessitates a re-evaluation of existing CNN architectures, suggesting that anti-aliasing filters should be an integral part of downsampling operations to maintain the foundational properties of convolutional networks.

Future research directions may explore:

Extending the anti-aliasing technique to other forms of invariance beyond translations.
Investigating the effect of anti-aliasing in deeper and more complex neural network architectures.
Studying the role of anti-aliasing in other domains such as sequential data and video processing, where temporal shift-invariance could be beneficial.

Conclusion

"Making Convolutional Networks Shift-Invariant Again" provides a methodologically sound and empirically validated approach to address a latent issue in modern CNN architectures. Through the introduction of anti-aliasing techniques, the paper not only enhances model performance but also brings back a critical property intrinsic to convolution operations, potentially impacting a wide range of applications in computer vision and beyond.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/sedielem/status/1900335650420727845

https://twitter.com/martijnende/status/1756446393449750621

https://twitter.com/_onionesque/status/1795759160736219214

https://twitter.com/NickEMoran/status/1790108610627883308

YouTube

Show All Videos