MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters (2311.04251v1)

Published 7 Nov 2023 in cs.LG, cs.AI, and cs.CV

Abstract: Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes. If expanding the network's size is needed, it is necessary to retrain from scratch, which is expensive. To avoid this, one can grow from a small network by adding random weights over time to gradually achieve the target network size. However, this naive approach falls short in practice as it brings too much noise to the growing process. Prior work tackled this issue by leveraging the already learned weights and training data for generating new weights through conducting a computationally expensive analysis step. In this paper, we introduce MixtureGrowth, a new approach to growing networks that circumvents the initialization overhead in prior work. Before growing, each layer in our model is generated with a linear combination of parameter templates. Newly grown layer weights are generated by using a new linear combination of existing templates for a layer. On one hand, these templates are already trained for the task, providing a strong initialization. On the other, the new coefficients provide flexibility for the added layer weights to learn something new. We show that our approach boosts top-1 accuracy over the state-of-the-art by 2-2.5% on CIFAR-100 and ImageNet datasets, while achieving comparable performance with fewer FLOPs to a larger network trained from scratch. Code is available at https://github.com/chaudatascience/mixturegrowth.

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a novel approach to grow neural networks by recombining pre-trained parameters, which avoids retraining from scratch.
It employs linear combinations of parameter templates with orthogonal initialization to maintain learned features and enhance weight diversity.
Experimental results demonstrate up to 2.5% improvement in top-1 accuracy on CIFAR-100 and competitive performance on ImageNet with fewer FLOPs.

MixtureGrowth: An Efficient Approach for Increasing Neural Network Size through Recombination of Learned Parameters

Introduction

In the quest to enhance the performance of deep neural networks, researchers have sought various strategies, including neural architecture search (NAS), knowledge distillation, and parameter pruning, among others. These approaches, while effective, often result in models that optimize inference performance at the cost of increased computational complexity during the training phase. An alternative strategy that has gained interest involves starting with a smaller network model and progressively growing its size. This approach benefits from the initial reduced computational requirement of smaller models and the eventual superior performance of larger networks. However, the critical challenge lies in expanding the network size without necessitating a complete retraining from scratch, which could nullify the computational savings.

MixtureGrowth Methodology

MixtureGrowth introduces a novel technique for growing neural networks by essentially leveraging already learned weights. At its core, the idea is to augment the size of a neural network by introducing new weights that are linear combinations of pre-existing parameter templates. This process not only maintains the computational efficiency by reusing learned parameters but also ensures that the expanded network inherits the learned representations. More specifically, the process involves side-stepping the intensive computation traditionally required to analyze and initialize new weights by automating the expansion using a smart blending of learned parameters.

Parameter Templates and Linear Combinations: Existing neural networks, when designated to grow, can significantly benefit from a mechanism that neatly integrates newly generated weights without disturbing the learned representations. MixtureGrowth achieves this by preparing a set of parameter templates, which are essentially parts of the composition of layer weights in the smaller model. New weights are then introduced as new linear combinations of these templates.
Growth Strategies and Implementation: A pivotal aspect of the growth process is deciding on an effective strategy to initialize these new sets of linear coefficients for the added weights. Through experimental analysis, the paper highlights orthogonal initialization as an effective method for this purpose, promoting diversity and robustness in the new weights.

Experimental Findings

MixtureGrowth demonstrates its effectiveness through substantial improvements in top-1 accuracy, upscaling models on CIFAR-100 and ImageNet datasets with reduced computational complexity. Remarkable findings include:

Up to 2.5% improvement in top-1 accuracy on the CIFAR-100 dataset over state-of-the-art methods under equivalent computational constraints.
Comparable performance with significantly fewer FLOPs needed against larger networks trained from scratch, showcasing the efficiency of the approach.

Analysis and Future Directions

Several key insights emerge from experimenting with MixtureGrowth, notably the impact of linear coefficients' initialization strategies and the exploration of optimal growth points during training. The analysis suggests that adopting orthogonal coefficients for initializing new weights after growth can lead to more significant performance gains.

One promising avenue for future research could involve investigating the recombination and growth strategies across different network architectures and task domains. Furthermore, refinements in template selection and the linear combination process could extend the methodology's applicability, potentially opening new paths toward dynamically scalable neural networks that efficiently adapt to varying computational resources and task complexities.

Conclusion

MixtureGrowth presents a compelling strategy for increasing neural network size with minimal computational overhead, leveraging the strength of parameter recombination. Its ability to significantly boost performance while maintaining or even reducing the total computational cost poses an exciting prospect for the development of more efficient and adaptable neural networks.

PDF Markdown

Related Papers

GitHub

GitHub - chaudatascience/mixturegrowth: MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters (8 stars)

Tweets

https://twitter.com/DeskDuncan/status/1787681714514571554