Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks (2102.00554v1)

Published 31 Jan 2021 in cs.LG, cs.AI, cs.AR, cs.CV, and cs.NE

Abstract: The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.

Citations (601)

View on Semantic Scholar

Summary

The paper’s main contribution is a comprehensive survey of sparsity techniques showing that pruning can reduce model size by 10-100x without significant accuracy loss.
It details various pruning and growth methods applied during training and inference to balance computational efficiency with model performance.
The study highlights challenges in sparse model implementations and the need for hardware/software co-design for resource-constrained real-world applications.

Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training

The paper presents a comprehensive survey of sparsity in deep learning, focusing on techniques for pruning and growth to achieve efficient inference and training. Sparsity in neural networks offers significant reductions in memory and computational resources, aligning closely with the constraints typical in mobile and large-scale applications.

Key Areas of Focus:

Sparsity Techniques:
- The authors cover an extensive array of sparsification methods, distilling ideas from over 300 research papers. The focus is on both removing and adding components in neural networks, with pruning achieved through various strategies like magnitude-based and gradient-based methods.
Training and Inference:
- Different pruning schedules optimize the sparsity during distinct phases, either after or during training. The survey stresses the importance of selecting optimal elements for removal while balancing model accuracy with computational gains.
Practical Implementation:
- Efficient implementation of sparse networks requires attention to the storage overheads of sparse structures and the hardware's computational capabilities. Strategies include using blocked or structured sparsity formats, particularly beneficial in real-world applications requiring quick inference on resource-constrained hardware.

Numerical Results and Bold Claims:

The paper highlights that existing sparsification methods can lead to a reduction in model size by a factor of 10-100x without significant loss of accuracy. This theoretical claim suggests a practical gateway to implementing massive models efficiently on suitable hardware. However, realizing these speedups demands dedicated hardware and software co-design.

Implications and Future Directions:

Theoretical and Practical Balance: The paper underscores the necessity of refining sparsity techniques to balance theoretical advancements with practical implications. Understanding the short and long-term effects of pruning on model performance and generalization is crucial.
Hardware Integration: With the end of Moore's Law and limitations in hardware specialization opportunities, sparsity is poised to be a key enabler of computational efficiency, supporting complex AI workloads.
Ongoing Challenges: Major open problems remain, including the co-design of sparse models with hardware architectures, achieving multi-objective optimization in pruning, and enhancing the robustness of sparsified models against adversarial attacks.

Outlook:

The paper foresees the continued evolution of sparse networks as deep learning models grow larger, emphasizing that sparsity may offer an immediate and powerful lever for efficiency. Seamless integration with hardware will likely become an essential aspect of future innovations, pushing the frontier of what can be achieved in AI systems.

In conclusion, the insights and methodologies presented pave the way for practitioners and researchers to harness sparsity, sharpening the competitive edge of AI solutions needing computational efficiency alongside accuracy.

PDF Markdown

Related Papers

Tweets

https://twitter.com/neuro_nasko/status/1808194619064123646

YouTube

Show All Videos