Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 72 tok/s

Gemini 2.5 Pro 57 tok/s Pro

GPT-5 Medium 43 tok/s Pro

GPT-5 High 23 tok/s Pro

GPT-4o 107 tok/s Pro

Kimi K2 219 tok/s Pro

GPT OSS 120B 465 tok/s Pro

Claude Sonnet 4 39 tok/s Pro

2000 character limit reached

SlimNets: An Exploration of Deep Model Compression and Acceleration (1808.00496v1)

Published 1 Aug 2018 in cs.LG, cs.AI, and stat.ML

Abstract: Deep neural networks have achieved increasingly accurate results on a wide variety of complex tasks. However, much of this improvement is due to the growing use and availability of computational resources (e.g use of GPUs, more layers, more parameters, etc). Most state-of-the-art deep networks, despite performing well, over-parameterize approximate functions and take a significant amount of time to train. With increased focus on deploying deep neural networks on resource constrained devices like smart phones, there has been a push to evaluate why these models are so resource hungry and how they can be made more efficient. This work evaluates and compares three distinct methods for deep model compression and acceleration: weight pruning, low rank factorization, and knowledge distillation. Comparisons on VGG nets trained on CIFAR10 show that each of the models on their own are effective, but that the true power lies in combining them. We show that by combining pruning and knowledge distillation methods we can create a compressed network 85 times smaller than the original, all while retaining 96% of the original model's accuracy.

Citations (11)

View on Semantic Scholar