Advancing Model Pruning via Bi-level Optimization (2210.04092v4)

Published 8 Oct 2022 in cs.LG

Abstract: The deployment constraints in practical applications necessitate the pruning of large-scale deep learning models, i.e., promoting their weight sparsity. As illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the potential of improving their generalization ability. At the core of LTH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find 'winning tickets'. Yet, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient 'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP. This raises the question of how to close the gap between pruning accuracy and pruning efficiency? To tackle it, we pursue the algorithmic advancement of model pruning. Specifically, we formulate the pruning problem from a fresh and novel viewpoint, bi-level optimization (BLO). We show that the BLO interpretation provides a technically-grounded optimization base for an efficient implementation of the pruning-retraining learning paradigm used in IMP. We also show that the proposed bi-level optimization-oriented pruning method (termed BiP) is a special class of BLO problems with a bi-linear problem structure. By leveraging such bi-linearity, we theoretically show that BiP can be solved as easily as first-order optimization, thus inheriting the computation efficiency. Through extensive experiments on both structured and unstructured pruning with 5 model architectures and 4 data sets, we demonstrate that BiP can find better winning tickets than IMP in most cases, and is computationally as efficient as the one-shot pruning schemes, demonstrating 2-7 times speedup over IMP for the same level of model accuracy and sparsity.

Citations (61)

View on Semantic Scholar

Summary

The paper introduces BiP, a bi-level optimization method that decouples pruning mask selection from weight retraining to improve efficiency.
It leverages implicit gradient feedback to update the pruning mask, achieving up to 7x speed-up and higher accuracy than traditional methods.
This approach enables effective one-shot pruning, making deep neural networks more deployable in resource-constrained environments.

Advancing Model Pruning via Bi-level Optimization

The paper under discussion presents an innovative approach to model pruning, a vital process in optimizing the computational efficiency and deployment of Deep Neural Networks (DNNs). With the burgeoning use of deep learning models across various domains, there is a crucial need to deploy models in resource-constrained environments efficiently. This necessitates pruning, or the reduction of a model's parameters without significant loss of accuracy. The paper introduces a novel bi-level optimization (BLO)-based method termed as BiP (Bi-level Pruning) to advance model pruning.

Context and Motivation

Traditionally, model pruning methods like Iterative Magnitude Pruning (IMP), which are grounded in the Lottery Ticket Hypothesis, involve iterative cycles of pruning and retraining. While effective, such methods are computationally expensive, especially for large datasets and complex model architectures. On the other hand, one-shot pruning approaches offer computational efficiency but often fall short in matching the accuracy of the IMP-derived subnetworks, known as 'winning tickets'. The motivation behind this work is to develop a pruning method that combines the computational efficiency of one-shot pruning with the efficacy of IMP.

Bi-level Optimization Formulation

The authors propose a bi-level optimization framework as the solution. This framework distinctly separates the pruning task (upper-level problem) from the retraining task (lower-level problem), allowing for a more tailored optimization process. The upper-level objective is to optimize the pruning mask under sparsity constraints, while the lower-level focuses on optimizing weights post-pruning. Bi-level optimization enables a principled way to handle the complexity arising from these coupled objectives.

A key highlight of this work is the novel use of implicit gradients, derived from the lower-level optimization and fed back to the upper-level pruning mask optimization. This innovation allows for the effective updating of the pruning mask only needing first-order gradient information, thus maintaining computational efficiency akin to first-order optimization methods.

Numerical Results and Implications

Extensive experiments conducted, covering both structured and unstructured pruning across various architectures and datasets, demonstrate the effectiveness of BiP. It consistently achieves higher accuracy compared to both IMP and one-shot prunings, achieving up to 7x computational speed-up over traditional IMP methods. In several instances, BiP successfully identifies 'winning tickets' that surpass the accuracy of the original dense networks.

These results are significant as they suggest that BiP can effectively close the performance gap between pruning accuracy and computational efficiency often observed in traditional methods. The ability to directly prune to target sparsity levels without iterative retraining underlines the practical applicability of BiP in deployment contexts where computational resources and time are at a premium.

Broader Impact and Future Directions

The adoption of BLO for model pruning not only advances the theoretical understanding of pruning mechanisms but also paves the way for practical applications that can leverage sparsity for model deployment on constrained hardware. Given the rising importance of deploying AI models on edge devices, methods like BiP, which offer high efficiency and accuracy retention, hold significant promise.

Future research could delve into exploring advanced BLO algorithms to further optimize the retraining steps, potentially coupling them with more sophisticated dynamic data sampling techniques for enhanced generalization. Additionally, the adaptation of BiP for other forms of network compression and architecture search presents exciting opportunities to extend this work further into broader domains of machine learning efficiency.

PDF Markdown

Related Papers

YouTube

Show All Videos