Complexity of Linear Regions in Deep Networks (1901.09021v2)

Published 25 Jan 2019 in stat.ML, cs.LG, and math.PR

Abstract: It is well-known that the expressivity of a neural network depends on its architecture, with deeper networks expressing more complex functions. In the case of networks that compute piecewise linear functions, such as those with ReLU activation, the number of distinct linear regions is a natural measure of expressivity. It is possible to construct networks with merely a single region, or for which the number of linear regions grows exponentially with depth; it is not clear where within this range most networks fall in practice, either before or after training. In this paper, we provide a mathematical framework to count the number of linear regions of a piecewise linear network and measure the volume of the boundaries between these regions. In particular, we prove that for networks at initialization, the average number of regions along any one-dimensional subspace grows linearly in the total number of neurons, far below the exponential upper bound. We also find that the average distance to the nearest region boundary at initialization scales like the inverse of the number of neurons. Our theory suggests that, even after training, the number of linear regions is far below exponential, an intuition that matches our empirical observations. We conclude that the practical expressivity of neural networks is likely far below that of the theoretical maximum, and that this gap can be quantified.

Authors (2)

Boris Hanin (50 papers)
David Rolnick (68 papers)

Citations (211)

View on Semantic Scholar

Summary

The paper establishes a mathematical framework to quantify the number of linear regions in ReLU networks, showing that initial region growth scales linearly with neuron count.
It demonstrates that the boundary volumes between linear regions also scale linearly, implying that practical network expressivity is far less than theoretical limits.
The study finds that characteristics of linear regions remain stable during training, challenging conventional views on the advantages of increased network depth.

Complexity of Linear Regions in Deep Networks: A Critical Examination

The paper "Complexity of Linear Regions in Deep Networks" by Boris Hanin and David Rolnick presents a rigorous mathematical framework for understanding the expressivity of neural networks with piecewise linear activation functions, specifically focusing on the ReLU activation. The authors aim to quantify the number of linear regions such networks can form and evaluate how these regions evolve during training. This investigation is particularly relevant for elucidating the relationship between network structure and its functional capacity.

Core Contributions

The paper makes several noteworthy contributions:

Mathematical Framework for Counting Linear Regions: The authors establish a comprehensive mathematical approach to count the number of linear regions produced by a neural network. They derive that, at initialization, the number of regions along any one-dimensional subspace grows linearly with the total number of neurons, contradicting the theoretical possibility of exponential growth. This observation is substantiated through empirical studies and argues against the assumption that typical networks attain their theoretical expressivity bounds.
Volume of Boundary Regions: Another significant contribution is the analysis of the volume of boundaries between linear regions. The paper demonstrates that this volume scales linearly with the number of neurons at initialization. The implication is that deep networks' practical expressivity, as reflected in linear region boundaries, may be far less than the theoretical maximum.
Impact of Initialization and Training: The authors provide evidence that the characteristics of linear regions remain consistent during training, further supporting the notion that practical networks do not explore the full potential of their theoretical expressivity, neither at initialization nor during training.

Methodology

The authors utilize both theoretical tools and computational experiments. They employ co-area formulas and probabilistic methods to derive bounds on the number of linear regions and the distance to linear region boundaries. For empirical validation, they simulate ReLU networks trained on the MNIST dataset, examining the evolution of linear regions over training.

Theoretical Implications

This paper's results challenge several pre-existing hypotheses about neural network architectures. It downplays the often presumed benefits of depth concerning region complexity, as it suggests that both deep and shallow networks exhibit similar behavior in practice. This has implications for understanding network capacity and function interpolation, shifting the focus towards parameters like neuron and layer distribution rather than depth.

Practical Implications and Future Directions

From a practical standpoint, these findings suggest that enhancements in neural network performance may not necessarily come from heightened expressivity. Instead, focusing on improving learning algorithms and initialization strategies could leverage existing network architectures more effectively. Additionally, the paper raises interesting questions about robustness and generalization, suggesting that practical advances could come from tuning existing architectures rather than increasing their complexity.

Future research may delve into more complex or varied non-linear activations, alternative architectural configurations, or a deeper investigation into how training protocols impact the evolution of linear regions. The intersection of this work with adversarial examples could provide further insights into network vulnerabilities and stability.

In summary, "Complexity of Linear Regions in Deep Networks" contributes a detailed analysis and understanding of neural network expressivity that questions current assumptions, offering a refined perspective on how network architecture influences computational capacity. The work sets the stage for potential refinements in neural network design, training methodologies, and theoretical modeling to better harness the power of deep learning.

PDF Markdown