Emergent Mind

A Survey of Lottery Ticket Hypothesis

(2403.04861)
Published Mar 7, 2024 in cs.LG and cs.NE

Abstract

The Lottery Ticket Hypothesis (LTH) states that a dense neural network model contains a highly sparse subnetwork (i.e., winning tickets) that can achieve even better performance than the original model when trained in isolation. While LTH has been proved both empirically and theoretically in many works, there still are some open issues, such as efficiency and scalability, to be addressed. Also, the lack of open-source frameworks and consensual experimental setting poses a challenge to future research on LTH. We, for the first time, examine previous research and studies on LTH from different perspectives. We also discuss issues in existing works and list potential directions for further exploration. This survey aims to provide an in-depth look at the state of LTH and develop a duly maintained platform to conduct experiments and compare with the most updated baselines.

Taxonomy outlining the Lottery Ticket Hypothesis in machine learning models and pruning methods.

Overview

  • The Lottery Ticket Hypothesis (LTH) suggests that within large neural networks, there are smaller, efficient subnetworks ('winning tickets') that can perform as well as or better than the original model.

  • This paper reviews the theoretical support for LTH, its application to various neural network architectures, including CNNs, GNNs, and Transformers, and its potential in improving model efficiency.

  • It discusses key insights from empirical studies on LTH, such as the impact of pruning strategies and the identification of high-performing subnetworks at early training stages.

  • Future challenges include broadening LTH's applicability to newer models and enhancing understanding to guide neural network design, with implications for AI safety, ethics, and efficiency.

Survey of the Lottery Ticket Hypothesis: Insights and Applications

Introduction to Lottery Ticket Hypothesis (LTH)

The Lottery Ticket Hypothesis (LTH) posits that within large, dense neural network models, there exist smaller, sparse subnetworks—termed "winning tickets"—that can achieve comparable or improved performance relative to the original network when trained in isolation. Pioneered by Frankle and Carbin, the hypothesis challenges conventional perceptions of network pruning and provides a promising direction for enhancing model efficiency. This paper presents a comprehensive survey of LTH, shedding light on its theoretical underpinnings, extension to special models, and key factors influencing winning ticket identification. Furthermore, it explores algorithmic advancements aimed at optimizing LTH's practicality while delving into its intersection with broader subjects such as robustness, fairness, and federated learning.

Theoretical Foundations of LTH

Remarkable strides have been made in providing theoretical evidence supporting LTH's claims. Research demonstrates that given a sufficiently over-parameterized network, there exists a subnetwork capable of replicating the full network's performance. This has been extended to demonstrating the existence of strong lottery tickets—subnetworks that exhibit high performance without the necessity for training. The theoretical exploration also encompasses convolutional neural networks (CNNs) and generalizes to other architectures, such as Transformers and GNNs, providing a robust theoretical basis for LTH across a variety of network architectures.

Special Models: Extending LTH Beyond Conventional Architectures

The application of LTH extends beyond traditional dense networks to specialized models such as Graph Neural Networks (GNNs), Transformers, and Generative Models. Each of these models presents unique challenges and considerations for applying LTH, from addressing graph structure sparsity in GNNs to identifying transferable subnetworks in pre-trained transformers and generative models. The adaptability of LTH to these special cases underscores its broad applicability and potential impact across different domains of AI research.

Key Insights from Experimental Investigations

Empirical studies have elucidated several key insights regarding LTH, such as the extent of pruning feasibly without compromising model accuracy and the role of specific factors like zeros, signs, and the supermask. The concept of early-bird tickets emphasizes the potential for identifying winning tickets early in the training process, significantly reducing computational costs. Furthermore, variations in pruning strategies between layer-wise and global pruning offer nuanced understanding of sparsity distribution and its impact on model performance.

Algorithmic Advancements for LTH

Innovation in algorithms has been pivotal in addressing the practical challenges associated with LTH, particularly regarding efficiency and the cost of iterative retraining. Approaches such as Continuous Sparsification, Dual Lottery Ticket Hypothesis (DLTH), and structured pruning algorithms aim to streamline the process of identifying winning tickets. These advancements not only reduce the computational burden but also enhance the flexibility and applicability of LTH in real-world scenarios.

Intersection with Broader Topics

LTH's implications extend into areas such as model robustness, fairness, federated learning, and reinforcement learning, highlighting its relevance to current challenges in AI safety, ethics, and distributed computing. By exploring the connections between LTH and these subjects, the survey underscores the multifaceted impact of LTH on enhancing model efficiency, security, and equitable AI practices.

Future Directions and Open Issues

Despite its promising prospects, LTH faces open questions and challenges that warrant further exploration. These include accelerating winning tickets in practice, improving theoretical understanding for better network design, extending LTH to emerging models like diffusion models, and more. Addressing these issues will be crucial for realizing LTH's full potential and its application in developing more efficient, robust, and equitable AI systems.

Conclusion

This survey offers a panoramic view of the Lottery Ticket Hypothesis, encapsulating its theoretical foundations, practical algorithms, and broader implications. As LTH continues to evolve and intersect with various facets of AI research, it holds the promise of guiding the future direction of neural network design and optimization, heralding a new era of efficient and powerful AI systems.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube