Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging

Published 29 Jun 2023 in cs.LG and cs.AI | (2306.16788v3)

Abstract: Neural networks can be significantly compressed by pruning, yielding sparse models with reduced storage and computational demands while preserving predictive performance. Model soups (Wortsman et al., 2022) enhance generalization and out-of-distribution (OOD) performance by averaging the parameters of multiple models into a single one, without increasing inference time. However, achieving both sparsity and parameter averaging is challenging as averaging arbitrary sparse models reduces the overall sparsity due to differing sparse connectivities. This work addresses these challenges by demonstrating that exploring a single retraining phase of Iterative Magnitude Pruning (IMP) with varied hyperparameter configurations such as batch ordering or weight decay yields models suitable for averaging, sharing identical sparse connectivity by design. Averaging these models significantly enhances generalization and OOD performance over their individual counterparts. Building on this, we introduce Sparse Model Soups (SMS), a novel method for merging sparse models by initiating each prune-retrain cycle with the averaged model from the previous phase. SMS preserves sparsity, exploits sparse network benefits, is modular and fully parallelizable, and substantially improves IMP's performance. We further demonstrate that SMS can be adapted to enhance state-of-the-art pruning-during-training approaches.

Abstract PDF HTML Upgrade to Chat

References (79)

Citations (11)

View on Semantic Scholar

Summary

The paper’s main contribution is the adaptive Sparse Model Soup method that leverages previous pruning phases to preserve sparse connectivity and enhance generalization.
It integrates iterative magnitude pruning with model averaging to achieve up to a 2% accuracy improvement on benchmarks like CIFAR-10/100 and ImageNet.
The approach offers practical benefits for resource-constrained environments and opens avenues for further research in balancing sparsity with neural model robustness.

Sparse Model Soups: A Synthesis for Enhanced Pruning via Model Averaging

This paper explores a critical challenge in the field of sparse neural networks: the tension between achieving model sparsity through pruning and the potential degradation of sparsity when adopting model averaging techniques like model soups. Sparse Model Soups (SMS), the method introduced, strategically integrates these two techniques to enhance generalization and out-of-distribution (OOD) performance.

Sparse neural networks, achieved by pruning, are known to significantly reduce model complexity, storage, and computational requirements without sacrificing predictive power. Despite the advances, integrating multiple sparse models into a single one through parameter averaging—model soups—has not been straightforward due to diverse sparse patterns across models. SMS addresses this by averaging sparse models that share identical sparse connectivity, thereby preserving sparsity.

Methodology and Contributions

The paper's primary innovation, SMS, is an adaptive procedure that evolves from Iterative Magnitude Pruning (IMP) by using the average model of the prior phase as the starting point in subsequent pruning phases. This iteration ensures consistency and leverages the previous phase's knowledge, improving both the sparse model's performance and its generalization capability.

SMS is developed through the following key steps:

A pretrained model undergoes pruning to eliminate low-magnitude weights.
Derived models are retrained under different hyperparameters, ensuring diverse yet structurally consistent candidate models for averaging.
These models are averaged to form a Sparse Model Soup, which maintains the sparse structure due to shared connectivity.

The paper uses experiments on diverse benchmarks—such as CIFAR-10/100 and ImageNet—demonstrating SMS's effectiveness across architectures and tasks. The results point to SMS's superior performance over standard IMP and other adaptations like extended retraining IMP ( $IMP$ ) and IMP with repruning (IMP-RePrune).

Numerical Results and Insights

Numerical experiments reveal that SMS delivers consistent improvements in test accuracy. The results are especially compelling for high target sparsities (98% and beyond), where SMS maintains robust performance even as traditional methods falter due to sparsity degradation when merging diverging models. Among the results, SMS outperforms baselines such as $IMP$ and IMP-RePrune by up to 2% in accuracy, manifesting enhanced generalization and OOD performance.

SMS's utility extends beyond just enhancing sparse model averaging; it demonstrates adaptability in pruning-during-training methodologies like GMP and DPF. By integrating SMS into these frameworks, performance gains emphasize SMS's modular design and its potential for broader applicability across various sparsification methodologies.

Theoretical and Practical Implications

Theoretically, SMS suggests a robust model synthesis approach that adeptly balances between sparsity and model robustness, paving the way for more efficient learning strategies that make effective use of existing models without extensive retraining from scratch. Practically, the potential for parallelization and embedding into existing frameworks promises substantial improvements in resource-limited environments where efficiency is a paramount concern.

Outlook and Future Research Directions

The SMS framework opens promising avenues for future research in both theoretical and applied AI. Exploring further integration within dynamic sparse training paradigms or extending its application into more complex transfer learning scenarios could yield deeper insights and broader applicability. Furthermore, investigating the interplay between sparsity, regularization, and neural architecture choices stands as a potential research frontier catalyzed by SMS's insights.

In summary, Sparse Model Soups present a significant contribution to the field of neural model compression, revealing a sophisticated interplay between pruning efficiency and parameter averaging. This work stands as a testament to the viability of model averaging within the structured constraints of sparse connectivity, reiterating the potential for such techniques in advancing the efficiency and adaptability of neural architectures.

Markdown Report Issue