Structured Pruning of Deep Convolutional Neural Networks (1512.08571v1)

Published 29 Dec 2015 in cs.NE, cs.LG, and stat.ML

Abstract: Real time application of deep learning algorithms is often hindered by high computational complexity and frequent memory accesses. Network pruning is a promising technique to solve this problem. However, pruning usually results in irregular network connections that not only demand extra representation efforts but also do not fit well on parallel computation. We introduce structured sparsity at various scales for convolutional neural networks, which are channel wise, kernel wise and intra kernel strided sparsity. This structured sparsity is very advantageous for direct computational resource savings on embedded computers, parallel computing environments and hardware based systems. To decide the importance of network connections and paths, the proposed method uses a particle filtering approach. The importance weight of each particle is assigned by computing the misclassification rate with corresponding connectivity pattern. The pruned network is re-trained to compensate for the losses due to pruning. While implementing convolutions as matrix products, we particularly show that intra kernel strided sparsity with a simple constraint can significantly reduce the size of kernel and feature map matrices. The pruned network is finally fixed point optimized with reduced word length precision. This results in significant reduction in the total storage size providing advantages for on-chip memory based implementations of deep neural networks.

Citations (709)

View on Semantic Scholar

Summary

The paper introduces a structured pruning approach that applies channel-, kernel-, and intra-kernel sparsity to reduce CNN model complexity.
The methodology leverages particle filtering with evolutionary enhancements to select critical network connections before retraining and quantization.
Experimental results on MNIST and CIFAR-10 show maintained performance with significant reductions in computational load and memory size.

Structured Pruning of Deep Convolutional Neural Networks

Overview

The paper "Structured Pruning of Deep Convolutional Neural Networks" by Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung addresses the challenge of optimizing deep learning models for real-time applications, particularly on resource-constrained devices. The authors introduce a structured pruning methodology for Convolutional Neural Networks (CNNs) that incorporates channel-wise, kernel-wise, and intra-kernel strided sparsity. Their approach leverages a particle filtering technique to identify the most critical network connections, which are subsequently pruned and then retrained to compensate for lost performance. Furthermore, the pruned network is quantized, significantly reducing its storage size and enhancing suitability for embedded systems.

Methodology

The primary contribution of the paper is the introduction of structured sparsity at different granularities—channel, kernel, and intra-kernel levels. The method involves:

Pruning Granularities:
- Channel-Level Pruning: Eliminates entire channels (feature maps), thereby reducing the network's dimensions directly.
- Kernel-Level Pruning: Removes entire k×k kernels from the network.
- Intra-Kernel Strided Pruning: Introduces zero-valued weights at specific, well-defined locations within kernels to enforce sparsity patterns that are computationally advantageous.
Particle Filtering Approach:
- Selection of Pruning Candidates: Utilizes a particle filtering method to simulate different pruning patterns and evaluates their impact on network performance. The misclassification rate (MCR) is used to assign importance weights to different particles, guiding the pruning process.
- Evolutionary Particle Filter (EPF): Enhances the particle filter by integrating elements of genetic algorithms, maintaining diversity among pruning candidates and improving the overall efficacy of the pruning process.
Retraining and Fixed-Point Optimization:
- After pruning, the network is retrained to mitigate any performance degradation. Fixed-point optimization reduces the memory requirements by quantizing the network weights to lower precision while preserving performance.

Experimental Results

The authors conducted experiments on MNIST and CIFAR-10 datasets to validate their pruning approach. Key findings include:

Performance Retention: The pruned networks maintained comparable performance to the baseline while achieving significant parameter reductions. For instance, a network pruned to 1-20-20-20-20-500-10 configuration on MNIST retained its performance with only a marginal increase in MCR.
Channel and Kernel Pruning: Significantly reduced the number of convolution connections while maintaining performance.
Intra-Kernel Strided Pruning: Showed potential to further reduce computational complexity when combined with convolution lowering techniques.

Implications

Practically, this structured pruning strategy is advantageous for deploying deep learning models on embedded systems and parallel computing environments. The reduced model size not only facilitates on-chip memory storage but also minimizes energy consumption due to fewer DRAM accesses. Theoretically, the structured pruning framework provides a robust approach to balance model complexity and efficiency.

Future Directions

Future research could explore:

Extended Profiling: Detailed examination of execution time benefits conferred by reduced convolution layer complexity.
Network Parameter Space Exploration: Application of SMC techniques for thorough exploration of network configuration spaces.
Advanced Quantization Techniques: Further optimization methods to complement the pruning strategies for even more efficient deep learning models.

In summary, the structured pruning methodology presented in this paper offers a practical and theoretically sound approach to optimize the deployment of deep learning models in resource-constrained environments. The innovative use of particle filters and structured sparsity makes this a valuable contribution to the field of neural network optimization.