Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

Published 21 Aug 2018 in cs.CV | (1808.06866v1)

Abstract: This paper proposed a Soft Filter Pruning (SFP) method to accelerate the inference procedure of deep Convolutional Neural Networks (CNNs). Specifically, the proposed SFP enables the pruned filters to be updated when training the model after pruning. SFP has two advantages over previous works: (1) Larger model capacity. Updating previously pruned filters provides our approach with larger optimization space than fixing the filters to zero. Therefore, the network trained by our method has a larger model capacity to learn from the training data. (2) Less dependence on the pre-trained model. Large capacity enables SFP to train from scratch and prune the model simultaneously. In contrast, previous filter pruning methods should be conducted on the basis of the pre-trained model to guarantee their performance. Empirically, SFP from scratch outperforms the previous filter pruning methods. Moreover, our approach has been demonstrated effective for many advanced CNN architectures. Notably, on ILSCRC-2012, SFP reduces more than 42% FLOPs on ResNet-101 with even 0.2% top-5 accuracy improvement, which has advanced the state-of-the-art. Code is publicly available on GitHub: https://github.com/he-y/soft-filter-pruning

Abstract PDF Upgrade to Chat

Authors (5)

Citations (914)

View on Semantic Scholar

Summary

The paper introduces Soft Filter Pruning (SFP) which prunes filters based on their ℓ2-norm while allowing updates during training.
The method achieves notable FLOPs reduction on benchmarks like CIFAR-10 and ILSVRC-2012 with minimal accuracy drop.
SFP simplifies training by reducing reliance on pre-trained models, enabling efficient deployment on resource-constrained devices.

Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks

In the paper "Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks," the authors, Yang He et al., address the computational inefficiency of deep Convolutional Neural Networks (CNNs) with a novel approach termed Soft Filter Pruning (SFP). Traditional methods often face limitations such as reduced model capacity and reliance on pre-trained models. SFP introduces a paradigm where pruned filters are allowed to be updated during training, thus maintaining a higher model capacity and enabling training from scratch.

The authors highlight two primary advantages of SFP:

Larger Model Capacity: By allowing previously pruned filters to be updated, the model retains a larger optimization space compared to methods that fix pruned filters' values to zero. This effectively results in networks with higher learning capability.
Reduced Dependence on Pre-trained Models: Unlike previous methodologies requiring fine-tuning based on pre-trained models, SFP can simultaneously perform pruning and training from scratch. This significantly simplifies the overall model development process.

Methodology

SFP operates by dynamically pruning filters with low $\ell_2$ -norm values at the end of each training epoch. The procedure is iterative: filters are pruned, the network is trained for an epoch, and the process repeats. Importantly, pruned filters are not removed; instead, they are set to zero but continue to be updated during the training process. This differs significantly from traditional hard filter pruning, where pruned filters are permanently discarded and not updated, leading to a loss of model capacity.

The algorithm proceeds as follows:

Filter Selection: Filters are evaluated based on their $\ell_2$ -norm, and a percentage of filters with the lowest values are selected for pruning.
Filter Pruning: Selected filters are zeroed out but not removed, hence retaining the model's structural integrity.
Reconstruction: During subsequent training, the zeroed filters are updated, potentially regaining their original values or learning new patterns, thereby keeping the model's capacity intact.

The effectiveness of SFP is empirically validated on several benchmarks, including CIFAR-10 and ILSVRC-2012 datasets, across various ResNet architectures (ResNet-20, 32, 56, 110, 18, 34, 50, 101).

Results

The empirical results are compelling:

For ResNet models on CIFAR-10, substantial FLOPs reduction was achieved with minimal accuracy loss, and in some instances, there was a gain in accuracy. For example, a 30% pruning of ResNet-110 resulted in only a 0.30% drop in accuracy without fine-tuning, and a mere 0.18% drop with fine-tuning.
On the ILSVRC-2012 benchmark, SFP enabled pruning 42% of the FLOPs of ResNet-101 with an increase of 0.2% in top-5 accuracy.
SFP demonstrated a significant practical and theoretical speedup, indicating its utility in real-world applications where computational resources are constrained.

Implications and Future Directions

The SFP method provides a significant improvement over traditional filter pruning techniques by preserving model capacity and reducing dependence on pre-trained models. This could widely impact applications requiring efficient deployment of deep CNNs, particularly on resource-constrained devices such as mobile phones and embedded systems.

Theoretically, SFP advances the understanding of model capacity in pruned networks, suggesting that dynamic pruning and updating can lead to higher performance and potentially uncover new insights into network training dynamics.

Future research could focus on:

Combining SFP with Other Techniques: Integrating SFP with matrix decomposition and low-precision weights could yield even more efficient models.
Broader Application Testing: Extending SFP to other architectures and tasks, such as object detection and segmentation, to validate its versatility.
Hyperparameter Optimization: Fine-tuning the SFP process by exploring optimal intervals for pruning could further enhance its efficacy.

In conclusion, SFP represents a significant step forward in the quest for efficient deep learning models, striking a balance between model compression and performance that traditional methods have struggled to achieve. This paper provides a robust foundation for future explorations in the dynamic pruning and optimization of deep neural networks.

Markdown Report Issue