Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Approximating Continuous Functions by ReLU Nets of Minimal Width (1710.11278v2)

Published 31 Oct 2017 in stat.ML, cs.CC, cs.LG, math.CO, math.ST, and stat.TH

Abstract: This article concerns the expressive power of depth in deep feed-forward neural nets with ReLU activations. Specifically, we answer the following question: for a fixed $d_{in}\geq 1,$ what is the minimal width $w$ so that neural nets with ReLU activations, input dimension $d_{in}$, hidden layer widths at most $w,$ and arbitrary depth can approximate any continuous, real-valued function of $d_{in}$ variables arbitrarily well? It turns out that this minimal width is exactly equal to $d_{in}+1.$ That is, if all the hidden layer widths are bounded by $d_{in}$, then even in the infinite depth limit, ReLU nets can only express a very limited class of functions, and, on the other hand, any continuous function on the $d_{in}$-dimensional unit cube can be approximated to arbitrary precision by ReLU nets in which all hidden layers have width exactly $d_{in}+1.$ Our construction in fact shows that any continuous function $f:[0,1]{d_{in}}\to\mathbb R{d_{out}}$ can be approximated by a net of width $d_{in}+d_{out}$. We obtain quantitative depth estimates for such an approximation in terms of the modulus of continuity of $f$.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Boris Hanin (50 papers)
  2. Mark Sellke (57 papers)
Citations (215)

Summary

  • The paper establishes that with width d_in+1, ReLU nets can universally approximate any continuous function on the d_in-dimensional unit cube.
  • It rigorously demonstrates that networks with width d_in are insufficient for full approximation, highlighting the critical role of minimal width.
  • The study extends the results to multi-dimensional outputs, offering practical guidelines for neural network architecture design.

Approximating Continuous Functions by ReLU Nets of Minimal Width

The paper by Hanin and Sellke explores the fundamental properties of deep feed-forward neural networks with ReLU activation, specifically focusing on the networks' ability to approximate continuous functions given constraints on layer width. The authors address the question of the minimal width necessary for such networks to approximate any continuous, real-valued function of a given input dimension dind_{in} to arbitrary precision. The pivotal result is that for ReLU nets with width din+1d_{in}+1, any continuous function defined on the dind_{in}-dimensional unit cube can be approximated arbitrarily well, highlighting the critical role of width in determining the network's expressivity.

Core Results

The key finding of the paper is the determination that the minimal width necessary to approximate any continuous function on [0,1]din[0,1]^{d_{in}} is precisely din+1d_{in}+1. This is a robust result implying that with widths limited to dind_{in}, even infinitely deep networks cannot approximate all functions in this class. Conversely, augmenting every hidden layer width just beyond this threshold enables universal approximation within this functional space. Furthermore, the authors extend the analysis to consider continuous functions mapping to higher-dimensional outputs, deriving that the required width generalizes to din+doutd_{in} + d_{out} for output dimension doutd_{out}, again asserting the role of minimal layer capacity in function approximation ability.

The authors provide rigorous mathematical construction and proofs, detailing how a ReLU network with these specified widths can be constructed to approximate given continuous functions. Quantitative estimates of the network depth are given in terms of the modulus of continuity of the function being approximated, providing insights into how network depth scales with the function's complexity and continuity characteristics.

Theoretical Contribution

This work contributes to the theoretical understanding of neural network expressivity, specifically elucidating the 'expressive power of depth'—how depth relates to network capacity when parameter restrictions are in place. The results build upon earlier universal approximation theorems by imposing practical constraints on network architecture and exploring their implications.

A significant implication is on network design: minimal width guarantees exist for universal approximation, allowing practitioners to predetermine network sizes sufficient for specific tasks, assuming certain levels of depth. This balance between width and depth contributes to a refined trade-off understanding that practitioners can exploit in network architecture design.

Discussion and Future Directions

The results stimulate further exploration into the roles of other factors like specific activation functions or network sparsity on approximation power, especially in view of novel neural architectures such as residual networks or networks incorporating skip connections. In this vein, understanding how skip connections might alter these width constraints, or how different non-linear activations impact representational capabilities, remains an open avenue of inquiry. Additionally, examining how such theoretical constraints manifest in empirical situations, where ideal conditions are seldom met, presents another significant future direction.

In conclusion, Hanin and Sellke contribute a critical piece towards demystifying neural network operations, focusing on the definition of minimal architectural parameters necessary for broad functional efficacy. Their work offers foundational insights that align theoretical properties with the practical demands and designs of neural network-based solutions in various domains.

Youtube Logo Streamline Icon: https://streamlinehq.com