Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation (2305.18460v3)
Abstract: The study of universal approximation properties (UAP) for neural networks (NN) has a long history. When the network width is unlimited, only a single hidden layer is sufficient for UAP. In contrast, when the depth is unlimited, the width for UAP needs to be not less than the critical width $w*_{\min}=\max(d_x,d_y)$, where $d_x$ and $d_y$ are the dimensions of the input and output, respectively. Recently, \cite{cai2022achieve} shows that a leaky-ReLU NN with this critical width can achieve UAP for $Lp$ functions on a compact domain ${K}$, \emph{i.e.,} the UAP for $Lp({K},\mathbb{R}{d_y})$. This paper examines a uniform UAP for the function class $C({K},\mathbb{R}{d_y})$ and gives the exact minimum width of the leaky-ReLU NN as $w_{\min}=\max(d_x,d_y)+\Delta (d_x, d_y)$, where $\Delta (d_x, d_y)$ is the additional dimensions for approximating continuous functions with diffeomorphisms via embedding. To obtain this result, we propose a novel lift-flow-discretization approach that shows that the uniform UAP has a deep connection with topological theory.
- Barron, A. R. Approximation and estimation bounds for artificial neural networks. Machine learning, 14(1):115–133, 1994.
- Expressiveness of Neural Networks Having Width Equal or Below the Input Dimension. arxiv:2011.04923, 2020.
- On decision regions of narrow deep neural networks. Neural networks, 140:121–129, 2021.
- $L^p$ Approximation of maps by diffeomorphisms. Calculus of Variations and Partial Differential Equations, 16(2):147–164, 2003.
- Cai, Y. Achieve the minimum width of neural networks for universal approximation. arXiv preprint arXiv:2209.11395, 2022.
- Caponigro, M. Orientation preserving diffeomorphisms and flows of control-affine systems. IFAC Proceedings Volumes, 44(1):8016–8021, 2011.
- Chong, K. F. E. A closer look at the approximation capabilities of neural networks. In International Conference on Learning Representations, 2020.
- Cybenko, G. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
- Daniely, A. Depth separation for neural networks. In Conference on Learning Theory, 2017.
- Vanilla feedforward neural networks as a discretization of dynamic systems. arXiv preprint arXiv:2209.10909, 2022.
- Approximating Continuous Functions by ReLU Nets of Minimal Width. arXiv preprint arXiv:1710.11278, 2018.
- Hirsch, M. W. Differential Topology. Springer New York, 1976.
- Hornik, K. Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257, 1991.
- Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.
- Neural Autoregressive Flows. In International Conference on Machine Learning, 2018.
- Hwang, G. Minimum width for deep, narrow mlp: A diffeomorphism and the whitney embedding theorem approach. arXiv preprint arXiv:2308.15873, 2023.
- Johnson, J. Deep, Skinny Neural Networks are not Universal Approximators. In International Conference on Learning Representations, 2019.
- Minimum width for universal approximation using relu networks on compact domain. arXiv preprint arXiv:2309.10402, 2023.
- Universal Approximation of Residual Flows in Maximum Mean Discrepancy. International Conference on Machine Learning workshop, 2021.
- Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Neural Computation, 20(6):1631–1649, 2008.
- Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6(6):861–867, 1993.
- The Expressive Power of Neural Networks: A View from the Width. In Neural Information Processing Systems, 2017.
- Montufar, G. F. Universal approximation depth and errors of narrow belief networks with discrete units. Neural Computation, 26(7):1386–1407, 2014.
- Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions. In International Conference on Machine Learning, 2018.
- Minimum Width for Universal Approximation. In International Conference on Learning Representations, 2021.
- Residual networks as geodesic flows of diffeomorphisms. arXiv preprint arXiv:1805.09585, 2018.
- Neural ODE control for classification, approximation and transport. arXiv: 2104.05278, 2021.
- Deep, Narrow Sigmoid Belief Networks Are Universal Approximators. Neural Computation, 20(11):2629–2636, 2008.
- Universal approximation power of deep residual neural networks through the lens of control. IEEE Transactions on Automatic Control, 68, 2023.
- Telgarsky, M. Benefits of depth in neural networks. In Conference on learning theory, 2016.
- Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators. In Neural Information Processing Systems, 2020a.
- Universal Approximation Property of Neural Ordinary Differential Equations. Neural Information Processing Systems 2020 Workshop on Differential Geometry meets Deep Learning, 2020b.
- Whitney, H. The Self-Intersections of a Smooth n-Manifold in 2n-Space. Annals of Mathematics, 45(2):220–246, 1944.
- Approximation Capabilities of Neural ODEs and Invertible Residual Networks. In International Conference on Machine Learning, 2019.