Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 45 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 11 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 88 tok/s Pro
Kimi K2 214 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

SPDY: Accurate Pruning with Speedup Guarantees (2201.13096v2)

Published 31 Jan 2022 in cs.LG

Abstract: The recent focus on the efficiency of deep neural networks (DNNs) has led to significant work on model compression approaches, of which weight pruning is one of the most popular. At the same time, there is rapidly-growing computational support for efficiently executing the unstructured-sparse models obtained via pruning. Yet, most existing pruning methods minimize just the number of remaining weights, i.e. the size of the model, rather than optimizing for inference time. We address this gap by introducing SPDY, a new compression method which automatically determines layer-wise sparsity targets achieving a desired inference speedup on a given system, while minimizing accuracy loss. SPDY is composed of two new techniques: the first is an efficient dynamic programming algorithm for solving the speedup-constrained layer-wise compression problem assuming a set of given layer-wise sensitivity scores; the second is a local search procedure for determining accurate layer-wise sensitivity scores. Experiments across popular vision and LLMs show that SPDY guarantees speedups while recovering higher accuracy relative to existing strategies, both for one-shot and gradual pruning scenarios, and is compatible with most existing pruning approaches. We also extend our approach to the recently-proposed task of pruning with very little data, where we achieve the best known accuracy recovery when pruning to the GPU-supported 2:4 sparsity pattern.

Citations (28)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.