Approximating Continuous Functions by ReLU Nets of Minimal Width (1710.11278v2)

Published 31 Oct 2017 in stat.ML, cs.CC, cs.LG, math.CO, math.ST, and stat.TH

Abstract: This article concerns the expressive power of depth in deep feed-forward neural nets with ReLU activations. Specifically, we answer the following question: for a fixed $d_{in}\geq 1,$ what is the minimal width $w$ so that neural nets with ReLU activations, input dimension $d_{in}$, hidden layer widths at most $w,$ and arbitrary depth can approximate any continuous, real-valued function of $d_{in}$ variables arbitrarily well? It turns out that this minimal width is exactly equal to $d_{in}+1.$ That is, if all the hidden layer widths are bounded by $d_{in}$, then even in the infinite depth limit, ReLU nets can only express a very limited class of functions, and, on the other hand, any continuous function on the $d_{in}$-dimensional unit cube can be approximated to arbitrary precision by ReLU nets in which all hidden layers have width exactly $d_{in}+1.$ Our construction in fact shows that any continuous function $f:[0,1]^{{d_{in}}\to\mathbb} R^{d_{out}}$ can be approximated by a net of width $d_{in}+d_{out}$. We obtain quantitative depth estimates for such an approximation in terms of the modulus of continuity of $f$.

Authors (2)

Boris Hanin (50 papers)
Mark Sellke (57 papers)

Citations (215)

View on Semantic Scholar

Summary

The paper establishes that with width d_in+1, ReLU nets can universally approximate any continuous function on the d_in-dimensional unit cube.
It rigorously demonstrates that networks with width d_in are insufficient for full approximation, highlighting the critical role of minimal width.
The study extends the results to multi-dimensional outputs, offering practical guidelines for neural network architecture design.

Approximating Continuous Functions by ReLU Nets of Minimal Width

The paper by Hanin and Sellke explores the fundamental properties of deep feed-forward neural networks with ReLU activation, specifically focusing on the networks' ability to approximate continuous functions given constraints on layer width. The authors address the question of the minimal width necessary for such networks to approximate any continuous, real-valued function of a given input dimension $d_{in}$ to arbitrary precision. The pivotal result is that for ReLU nets with width $d_{in}+1$ , any continuous function defined on the $d_{in}$ -dimensional unit cube can be approximated arbitrarily well, highlighting the critical role of width in determining the network's expressivity.

Core Results

The key finding of the paper is the determination that the minimal width necessary to approximate any continuous function on $[0,1]^{d_{in}}$ is precisely $d_{in}+1$ . This is a robust result implying that with widths limited to $d_{in}$ , even infinitely deep networks cannot approximate all functions in this class. Conversely, augmenting every hidden layer width just beyond this threshold enables universal approximation within this functional space. Furthermore, the authors extend the analysis to consider continuous functions mapping to higher-dimensional outputs, deriving that the required width generalizes to $d_{in} + d_{out}$ for output dimension $d_{out}$ , again asserting the role of minimal layer capacity in function approximation ability.

The authors provide rigorous mathematical construction and proofs, detailing how a ReLU network with these specified widths can be constructed to approximate given continuous functions. Quantitative estimates of the network depth are given in terms of the modulus of continuity of the function being approximated, providing insights into how network depth scales with the function's complexity and continuity characteristics.

Theoretical Contribution

This work contributes to the theoretical understanding of neural network expressivity, specifically elucidating the 'expressive power of depth'—how depth relates to network capacity when parameter restrictions are in place. The results build upon earlier universal approximation theorems by imposing practical constraints on network architecture and exploring their implications.

A significant implication is on network design: minimal width guarantees exist for universal approximation, allowing practitioners to predetermine network sizes sufficient for specific tasks, assuming certain levels of depth. This balance between width and depth contributes to a refined trade-off understanding that practitioners can exploit in network architecture design.

Discussion and Future Directions

The results stimulate further exploration into the roles of other factors like specific activation functions or network sparsity on approximation power, especially in view of novel neural architectures such as residual networks or networks incorporating skip connections. In this vein, understanding how skip connections might alter these width constraints, or how different non-linear activations impact representational capabilities, remains an open avenue of inquiry. Additionally, examining how such theoretical constraints manifest in empirical situations, where ideal conditions are seldom met, presents another significant future direction.

In conclusion, Hanin and Sellke contribute a critical piece towards demystifying neural network operations, focusing on the definition of minimal architectural parameters necessary for broad functional efficacy. Their work offers foundational insights that align theoretical properties with the practical demands and designs of neural network-based solutions in various domains.

PDF Markdown

Related Papers

YouTube

Show All Videos