CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning (2311.14272v2)
Abstract: Machine learning pipelines for classification tasks often train a universal model to achieve accuracy across a broad range of classes. However, a typical user encounters only a limited selection of classes regularly. This disparity provides an opportunity to enhance computational efficiency by tailoring models to focus on user-specific classes. Existing works rely on unstructured pruning, which introduces randomly distributed non-zero values in the model, making it unsuitable for hardware acceleration. Alternatively, some approaches employ structured pruning, such as channel pruning, but these tend to provide only minimal compression and may lead to reduced model accuracy. In this work, we propose CRISP, a novel pruning framework leveraging a hybrid structured sparsity pattern that combines both fine-grained N:M structured sparsity and coarse-grained block sparsity. Our pruning strategy is guided by a gradient-based class-aware saliency score, allowing us to retain weights crucial for user-specific classes. CRISP achieves high accuracy with minimal memory consumption for popular models like ResNet-50, VGG-16, and MobileNetV2 on ImageNet and CIFAR-100 datasets. Moreover, CRISP delivers up to 14$\times$ reduction in latency and energy consumption compared to existing pruning methods while maintaining comparable accuracy. Our code is available at https://github.com/shivmgg/CRISP/.
- K. Simonyan et al. Very deep convolutional networks for large-scale image recognition. In ICLR’15.
- Z. Dai et al. Transformer-xl: Attentive language models beyond a fixed-length context. ArXiv, abs/1901.02860, 2019.
- M. Hemmat et al. Cap’nn: Class-aware personalized neural network inference. In DAC’20.
- E. Qin et al. Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In HPCA’20.
- V. Goyal et al. Myml: User-driven machine learning. In DAC’21.
- Ye-Da Ma et al. Ocap: On-device class-aware pruning for personalized edge dnn models. JSA’23.
- A. K. Mishra et al. Accelerating sparse deep neural networks. ArXiv, abs/2104.08378, 2021.
- Nvidia ampere sparse tensor core. In Technical report, NVIDIA’20.
- W. Sun et al. Dominosearch: Find layer-wise fine-grained n:m sparse schemes from dense neural networks. In NeurIPS’21.
- S. Gray et al. Gpu kernels for block-sparse weights. 2017.
- Z. Qin et al. Captorx: A class-adaptive convolutional neural network reconfiguration framework. TCAD’21.
- S. Aggarwal et al. Chameleon: Dual memory replay for online continual learning on edge devices. In DATE’23.
- Y. Saad. Numerical Methods for Large Eigenvalue Problems. SIAM’98.
- D. Kincaid et al. Itpackv 2d user’s guide. In ICS’89.
- X. Liu et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors. In ICS’13.
- Y. Bengio et al. Estimating or propagating gradients through stochastic neurons for conditional computation. ArXiv, abs/1308.3432, 2013.
- D. Guo et al. A hybrid format for better performance of sparse matrix-vector multiplication on a gpu. IJHPCA’16.
- H. Tanaka et al. Pruning neural networks without any data by iteratively conserving synaptic flow. In NeurIPS’20.
- X. Ding et al. Global sparse momentum sgd for pruning very deep neural networks. In NeurIPS’19.
- Y. Wang et al. Dual-side sparse tensor core. In ISCA’21.
- J. Deng et al. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR’09.
- K. He et al. Deep residual learning for image recognition. In CVPR’16.
- M. Sandler et al. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR’18.
- S. Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
- Y. Wu et al. Sparseloop: An analytical approach to sparse tensor accelerator modeling. In MICRO’22.
- S. others Li. Cacti-p: Architecture-level modeling for sram-based structures with advanced leakage reduction techniques. In ICCAD’11.
- Shivam Aggarwal (8 papers)
- Kuluhan Binici (8 papers)
- Tulika Mitra (27 papers)