Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

Published 22 May 2017 in cs.NE, cs.CV, and cs.LG | (1705.07565v2)

Abstract: How to develop slim and accurate deep neural networks has become crucial for real- world applications, especially for those employed in embedded systems. Though previous work along this research line has shown some promising results, most existing methods either fail to significantly compress a well-trained deep network or require a heavy retraining process for the pruned deep network to re-boost its prediction performance. In this paper, we propose a new layer-wise pruning method for deep neural networks. In our proposed method, parameters of each individual layer are pruned independently based on second order derivatives of a layer-wise error function with respect to the corresponding parameters. We prove that the final prediction performance drop after pruning is bounded by a linear combination of the reconstructed errors caused at each layer. Therefore, there is a guarantee that one only needs to perform a light retraining process on the pruned network to resume its original prediction performance. We conduct extensive experiments on benchmark datasets to demonstrate the effectiveness of our pruning method compared with several state-of-the-art baseline methods.

Abstract PDF Upgrade to Chat

Citations (470)

View on Semantic Scholar

Summary

The paper presents L-OBS, which prunes network layers using Hessian-derived sensitivity scores to guarantee bounded performance loss.
It extends traditional Optimal Brain Damage and Brain Surgeon methods by computing layer-wise Hessians to reduce computational cost and retraining requirements.
Experimental results on models like LeNet and VGG-16 demonstrate competitive compression ratios and controlled error accumulation across layers.

Overview of "Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon"

Introduction

The paper presents a novel approach to pruning deep neural networks by leveraging a method called Layer-wise Optimal Brain Surgeon (L-OBS). The goal is to achieve model compression while preserving prediction performance with minimal retraining. Traditional pruning methods often lack theoretical guarantees and require extensive retraining. This paper proposes a layer-wise method that addresses these issues by pruning based on second-order derivative information.

Methodology

The approach outlined in this paper builds upon foundational techniques such as Optimal Brain Damage (OBD) and Optimal Brain Surgeon (OBS) but extends these to deep networks. The key innovation is in computing the Hessian matrix for each layer independently, significantly reducing computational overhead.

Objective: The primary goals are to achieve high compression for each layer, maintain a theoretical guarantee on prediction performance, and require only light retraining.
Pruning Criteria: Parameters are selected for pruning based on the sensitivity score derived from the Hessian matrix of the layer. This score measures the change in the error function, utilizing second-order derivatives for precise pruning decisions.
Theoretical Guarantee: The paper provides a formal analysis proving that the overall performance drop is bounded by the sum of reconstructed errors from each layer.

Results

The experimental evaluation demonstrates that L-OBS significantly outperforms state-of-the-art methods in terms of compression ratio and retains accuracy with minimal retraining:

LeNet-300-100 and LeNet-5: Achieved compression ratios of 7% and 8% while maintaining low error rates after pruning.
CIFAR-Net and VGG-16: The method was applied successfully with compression ratios around 7-9% on challenging datasets.
Scalability: L-OBS is efficiently implemented on large-scale networks like AlexNet, offering competitive compression without heavy retraining.

Analysis and Implications

The L-OBS framework reveals several practical implications and insights:

Computational Efficiency: Restricting the computation of the Hessian to individual layers drastically reduces the cost of computing the inverse matrix, making the pruning process feasible for modern deep networks.
Adaptability: The method extends easily to convolutional layers by treating filters as fully connected layers, enabling use in a wide array of architectures.
Error Management: By controlling layer-wise errors and leveraging theoretical guarantees, L-OBS ensures that the accumulated error across layers remains bounded and manageable.

Future Developments

Although L-OBS shows promising results, further research could explore optimizing the method for networks with extensive residual connections, such as ResNet. Additionally, extending the approach to real-time applications in embedded systems may present opportunities for development.

Conclusion

The paper provides an effective and theoretically grounded approach to deep neural network pruning. The Layer-wise Optimal Brain Surgeon method balances the trade-off between compression and accuracy while significantly reducing the need for retraining, marking a substantial contribution to the field of efficient neural network deployment.

Markdown