The power of deeper networks for expressing natural functions (1705.05502v2)

Published 16 May 2017 in cs.LG, cs.NE, and stat.ML

Abstract: It is well-known that neural networks are universal approximators, but that deeper networks tend in practice to be more powerful than shallower ones. We shed light on this by proving that the total number of neurons $m$ required to approximate natural classes of multivariate polynomials of $n$ variables grows only linearly with $n$ for deep neural networks, but grows exponentially when merely a single hidden layer is allowed. We also provide evidence that when the number of hidden layers is increased from $1$ to $k$, the neuron requirement grows exponentially not with $n$ but with $n^{1/k}$, suggesting that the minimum number of layers required for practical expressibility grows only logarithmically with $n$.

Citations (170)

View on Semantic Scholar

Summary

The paper proves that deeper networks require exponentially fewer neurons than shallow networks to efficiently approximate certain multivariate polynomials.
Practically, increasing network depth allows approximating functions with significantly fewer neurons, guiding more efficient neural network architectural design.
Theoretically, these findings support depth-first architectural choices and highlight the distinction between efficient function expression and the challenge of efficient learning.

Analysis of "The Power of Deeper Networks for Expressing Natural Functions" by Rolnick and Tegmark

Deep learning models, particularly feedforward neural networks, have captivated the research community with their ability to approximate complex functions. The paper by Rolnick and Tegmark takes a rigorous approach to examine why deeper architectures outperform shallower networks in approximating natural function classes. Historically noted as universal approximators, neural networks leverage depth to enhance their representational capability, something this paper explores through mathematical proof and empirical verification.

Key Findings

The paper establishes that the neuron count required for approximating specific multivariate polynomials scales differently with depth as compared to shallowness:

Efficiency of Depth: Deep networks approximate multivariate polynomials with neuron counts growing linearly with the number of variables, $n$ , as opposed to exponentially with a shallow single-layer network. The authors prove that the neuron requirement for deep networks grows exponentially with $n^{1/k}$ when increasing the depth from 1 to $k$ . This indicates that deep networks retain expressibility even as the number of input variables increases.
Resource Requirements: Practical implementation of this efficiency is demonstrated through uniform approximation rather than Taylor approximation, broadening applicability to standard network architectures, including feedforward neural networks.

The proofs rely heavily on properties of compositionality inherent in the polynomials under consideration and demonstrate an exponential advantage in representational capacity when utilizing networks with depth.

Implications

Practical Implications

From a practical standpoint, this suggests optimized architectures: networks tasked with approximating simple functions should minimize the neuron count by increasing depth rather than width. The paper’s insight that the layer count should be logarithmic concerning the number of variables offers a heuristic for architectural design, signaling reduced resource consumption in terms of computation and memory through deeper networks.

Theoretical Implications

Theoretically, these findings contribute a nuanced understanding of neural networks’ expressibility beyond the universal approximation theorem, fundamentally supporting the choice of architectures based on empirical parameters, such as network depth, and not solely on abstract theoretical guarantees.

Areas for Further Exploration

The paper opens avenues for future exploration in assessing the computational resources for learning as distinct from function computation. While expressing functions efficiently with a deep network is validated, the challenging question remains around learning these representations efficiently, especially under varying dataset constraints and training paradigms.

Moreover, probing into architectures like residual networks or unitary nets might further elucidate how depth can be leveraged while avoiding optimization issues such as vanishing or exploding gradients. A deeper understanding here could harness the benefits of theoretical expressiveness with practical ease of training.

Overall, Rolnick and Tegmark's exposition sheds light on the fundamental importance of depth in neural networks, articulating a detailed narrative atop of which further advancements in AI architecture could build. This paper fortifies the strategic design of machine learning architectures that balance expressiveness and computational efficiency, channeling resources judiciously in the burgeoning exploration of artificial neural networks.