Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 44 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 13 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Activation Density driven Energy-Efficient Pruning in Training (2002.02949v2)

Published 7 Feb 2020 in cs.LG, cs.CV, cs.NE, and stat.ML

Abstract: Neural network pruning with suitable retraining can yield networks with considerably fewer parameters than the original with comparable degrees of accuracy. Typical pruning methods require large, fully trained networks as a starting point from which they perform a time-intensive iterative pruning and retraining procedure to regain the original accuracy. We propose a novel pruning method that prunes a network real-time during training, reducing the overall training time to achieve an efficient compressed network. We introduce an activation density based analysis to identify the optimal relative sizing or compression for each layer of the network. Our method is architecture agnostic, allowing it to be employed on a wide variety of systems. For VGG-19 and ResNet18 on CIFAR-10, CIFAR-100, and TinyImageNet, we obtain exceedingly sparse networks (up to $200 \times$ reduction in parameters and over $60 \times$ reduction in inference compute operations in the best case) with accuracy comparable to the baseline network. By reducing the network size periodically during training, we achieve total training times that are shorter than those of previously proposed pruning methods. Furthermore, training compressed networks at different epochs with our proposed method yields considerable reduction in training compute complexity ($1.6\times$ to $3.2\times$ lower) at near iso-accuracy as compared to a baseline network trained entirely from scratch.

Citations (4)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.