Why Deep Neural Networks for Function Approximation? (1610.04161v2)

Published 13 Oct 2016 in cs.LG and cs.NE

Abstract: Recently there has been much interest in understanding why deep neural networks are preferred to shallow networks. We show that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neurons needed by a deep network for a given degree of function approximation. First, we consider univariate functions on a bounded interval and require a neural network to achieve an approximation error of $\varepsilon$ uniformly over the interval. We show that shallow networks (i.e., networks whose depth does not depend on $\varepsilon$) require $\Omega(\text{poly}(1/\varepsilon))$ neurons while deep networks (i.e., networks whose depth grows with $1/\varepsilon$) require $\mathcal{O}(\text{polylog}(1/\varepsilon))$ neurons. We then extend these results to certain classes of important multivariate functions. Our results are derived for neural networks which use a combination of rectifier linear units (ReLUs) and binary step units, two of the most popular type of activation functions. Our analysis builds on a simple observation: the multiplication of two bits can be represented by a ReLU.

Citations (379)

View on Semantic Scholar

Summary

The paper demonstrates that deep neural networks approximate piecewise smooth functions with exponentially fewer neurons than shallow networks.
It establishes that deep architectures use only polylog(1/ε) neurons for univariate approximations, contrasting with the polynomial requirement of shallow networks.
The research extends to multivariate functions and derives lower bounds for ReLU networks, underlining the efficiency gained from increased depth.

Analyzing Function Approximation with Deep and Shallow Neural Networks

The paper, "Why Deep Neural Networks for Function Approximation?" by Shiyu Liang and R. Srikant, addresses a critical question in the field of machine learning — why deep neural networks are typically favored over their shallow counterparts when approximating functions. The authors offer robust theoretical contributions that highlight the relative efficiency of deep neural networks in approximating a broad class of piecewise smooth functions, which require exponentially more neurons in a shallow network to achieve a similar approximation accuracy.

Summary of Contributions

The paper systematically explores the approximation capabilities of shallow versus deep architectures. Some of the core contributions can be summarized as follows:

Efficiency in Neuron Usage: For univariate functions on a bounded interval, deep neural networks require only $\mathcal{O}(\text{polylog}(1/\varepsilon))$ neurons, compared to the $\Omega(\text{poly}(1/\varepsilon))$ neurons necessary in shallow networks to attain a uniform approximation error of $\varepsilon$ over the interval. This establishes that for a fixed approximation precision, deep architectures can achieve the task with exponentially fewer neurons.
Extension to Multivariate Functions: The results extend to multivariate functions where the benefit of using deep structures becomes more pronounced. The authors demonstrate that essential multivariate functions can also be approximated with polynomially fewer neurons by deep networks compared to shallow networks.
Lower Bounds for Network Size: A lower bound on the number of neurons needed for approximations by neural networks comprised of ReLUs and binary step units is derived for strongly convex functions. Specifically, such functions necessitate $\Omega(\log(1/\varepsilon))$ neurons in deep networks for a given approximation error $\varepsilon$ , indicating the tight construction of the results.
Implications of Depth: The paper illustrates that shallow networks, which have depths not increasing with approximation precision, necessitate sizes that grow exponentially in $1/\varepsilon$ to maintain approximation efficacy; this contrasts with the comparatively efficient scaling in deep networks.

Implications and Future Work

The findings in this paper have broad implications for the design and application of neural networks in complex tasks such as machine learning. The exponential scaling demonstrated for shallow networks hints at significant inefficiencies, reinforcing the empirical prominence of deep networks in real-world applications. Furthermore, the theoretical foundations laid out could inform more efficient training algorithms, architecture design, and scaling laws for neural networks.

As for future directions, researchers might focus on extending these results to cover additional classes of functions, including non-smooth or stochastic processes, to determine the limits of deep learning in even more complex scenarios. Moreover, the complexity of specific architectures and activation functions beyond ReLUs and binary step units might be explored to tailor neural networks more closely to particular tasks or data structures.

In conclusion, Liang and Srikant's research provides valuable insights into the tangible benefits of deep neural networks, going beyond previous empirical observations to deliver a compelling theoretical justification for their widespread use in function approximation. This work serves as a cornerstone for future innovations and optimizations in neural network design and application.

PDF Markdown

Why Deep Neural Networks for Function Approximation? (1610.04161v2)

Summary

Analyzing Function Approximation with Deep and Shallow Neural Networks

Summary of Contributions

Implications and Future Work

Related Papers