The Reversible Residual Network: Backpropagation Without Storing Activations (1707.04585v1)

Published 14 Jul 2017 in cs.CV and cs.LG

Abstract: Deep residual networks (ResNets) have significantly pushed forward the state-of-the-art on image classification, increasing in performance as networks grow both deeper and wider. However, memory consumption becomes a bottleneck, as one needs to store the activations in order to calculate gradients using backpropagation. We present the Reversible Residual Network (RevNet), a variant of ResNets where each layer's activations can be reconstructed exactly from the next layer's. Therefore, the activations for most layers need not be stored in memory during backpropagation. We demonstrate the effectiveness of RevNets on CIFAR-10, CIFAR-100, and ImageNet, establishing nearly identical classification accuracy to equally-sized ResNets, even though the activation storage requirements are independent of depth.

Citations (518)

View on Semantic Scholar

Summary

The paper introduces RevNet, an architecture that eliminates the need to store activations during backpropagation, significantly reducing memory consumption.
It employs invertible transformations in each layer to reconstruct inputs during the backward pass, achieving performance comparable to traditional ResNets.
Experimental evaluations demonstrate that RevNet can halve memory usage, enabling deeper network training and broader applications in memory-constrained environments.

The Reversible Residual Network: Backpropagation Without Storing Activations

The paper "The Reversible Residual Network: Backpropagation Without Storing Activations" introduces a significant advancement in neural network training by focusing on memory efficiency during backpropagation. The authors, Aidan N. Gomez, Mengye Ren, Raquel Urtasun, and Roger B. Grosse, propose a novel architecture that mitigates the constraints imposed by storing intermediate activations in traditional training processes. This research originates from the University of Toronto and Uber Advanced Technologies Group, reflecting an intersection of academic inquiry and industrial application.

Core Contribution

The primary contribution of this work is the introduction of the Reversible Residual Network (RevNet), a modification of the conventional residual network (ResNet). RevNet enables backpropagation without the need to store activations, a substantial improvement that addresses memory limitations in deep learning models. This is achieved by designing each layer of the network such that activations are computationally reversible—a method drawing inspiration from ideas in reversible computing.

The RevNet architecture is meticulously developed, ensuring that the memory footprint is drastically reduced without appreciable sacrifice to model performance. The design leverages invertible transformations in network layers, allowing the reconstruction of input activations during the backward pass. Consequently, this approach obviates the necessity of caching these activations, freeing up valuable memory resources especially crucial in resource-constrained environments.

Experimental Evaluation

The experimental results present a robust validation of the RevNet’s efficacy. Through a series of benchmarks on standard datasets, RevNets are demonstrated to achieve comparable accuracy to traditional ResNets while utilizing significantly less memory. These empirical results are critical, showing that RevNets can halve the memory usage required during training, a feat that could enable training of even deeper networks or processing larger input sizes than previously feasible.

Theoretical and Practical Implications

Theoretically, this research underscores the potential of invertible functions and transformations in deep learning, enriching the repertoire of techniques available for neural architecture design. Practically, the deployment of RevNets can lead to more efficient use of hardware, facilitating complex computations without necessitating proportional increases in memory availability. This has direct implications for deployment in environments where memory is a limiting factor, such as edge computing or mobile devices.

Future Directions

The paper briefly explores avenues for future work. Potential advancements include extending the reversible approach to other network architectures and further optimizations to reduce computational overhead. Exploration into hybrid models that blend conventional and reversible layers could lead to synergistic architectures providing an optimal balance between memory usage and computational complexity.

The broader implications of this innovation point towards an era where memory-efficient training methods become standard practice, aligning well with trends towards increasingly large-scale and complex models in AI research.

Overall, this paper presents a compelling advancement in neural network architecture, offering a powerful tool for both researchers and practitioners aiming to overcome memory limitations without compromising model integrity. The introduction of Reversible Residual Networks stands as a significant contribution to the field, opening pathways for efficient machine learning model training.

PDF Markdown

Related Papers

Tweets

https://twitter.com/cloneofsimo/status/1847636520393642101

https://twitter.com/zmkzmkz/status/1802226459832115606

YouTube

Show All Videos