Emergent Mind

Random ReLU Neural Networks as Non-Gaussian Processes

(2405.10229)
Published May 16, 2024 in stat.ML , cs.LG , and math.PR

Abstract

We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent $3/2$. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).

Pr_V follows a Gaussian distribution.

Overview

  • The paper explores the properties of shallow neural networks with random parameters and ReLU activation functions, establishing them as well-defined non-Gaussian processes.

  • Key findings include the characteristics of these networks, such as isotropy, wide-sense self-similarity with a Hurst exponent of 3/2, and a closed-form expression for the autocovariance function.

  • The research highlights the implications for AI modeling and Bayesian inference, emphasizing that initialization determines whether the networks follow Gaussian or non-Gaussian processes.

Random ReLU Neural Networks as Non-Gaussian Processes

Introduction

This paper explores the intriguing behavior of shallow neural networks with randomly initialized parameters and ReLU (Rectified Linear Unit) activation functions. Unlike usual Gaussian processes, these random ReLU neural networks emerge as well-defined non-Gaussian processes. The research dives into their properties, offers a new perspective by avoiding the typical asymptotic (infinite-width) viewpoint, and presents both theoretical and practical implications for AI research and development.

Key Findings

ReLU Networks and Well-Defined Processes

The researchers prove that shallow ReLU neural networks with random parameters are well-defined non-Gaussian processes. These networks, referred to as random ReLU neural networks, exhibit unique properties influenced by the distribution of their weights and biases and the density of activation thresholds in bounded regions of the input domain. Here are some high points:

  • Isotropic: These processes look the same in all directions, which implies they have rotational symmetry.
  • Wide-Sense Self-Similar: They scale in a self-consistent manner with a Hurst exponent of 3/2.
  • Closed-Form Autocovariance: A simple closed-form expression for the autocovariance function.

Technical Details

The processes are characterized by stochastic differential equations driven by impulsive white noise. The number of neurons in each bounded region is a random variable following a Poisson distribution with a mean proportional to the density parameter.

Asymptotic Behaviors

One of the significant contributions is showing that in the infinite-width regime, these networks can approach both Gaussian and non-Gaussian processes, depending on the distribution of weights. Specifically:

  • For Gaussian-distributed weights, the processes converge to classic Gaussian processes.
  • For weights following a symmetric alpha-stable law, the processes remain non-Gaussian even as width increases.

Implications

This insight means random neural networks' behaviors depend heavily on their initialization, breaking the traditional assumption that infinite-width networks are inherently Gaussian. In practical terms:

  • Modeling: Understanding the exact behavior helps in better modeling using neural networks, particularly in probabilistic and generative models.
  • Bayesian Inference: Although these methods traditionally assume Gaussianity, recognizing non-Gaussian behaviors opens new doors for more precise inference techniques.

Future Considerations

The findings pave the way for further exploration:

  1. Broader Functions and Architectures: Extending these insights to more complex network structures and different activation functions could uncover richer behaviors.
  2. Real-World Applications: Testing these theoretical results in real-world applications, particularly in reinforcement learning and Bayesian optimization, might significantly impact practical AI designs.
  3. Advanced Statistical Techniques: Developing new methods to better deal with non-Gaussian data arising from these neural network models.

Strong Numerical and Bold Results

The strong numerical results include a remarkably simple closed-form for the autocovariance function and the demonstration, through rigorous proofs, that these networks do not necessarily converge to Gaussian processes in wide limits—a bold deviation from traditional assumptions.

In summary, this research presents a new lens to view the behavior of shallow, random ReLU neural networks, emphasizing their non-Gaussian nature under specific conditions. The implications for both theory and practice in AI are substantial, suggesting further avenues for exploration and innovation.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.