Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

9 tokens/sec

GPT-4o

12 tokens/sec

Gemini 2.5 Pro Pro

40 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Neural Redshift: Random Networks are not Random Functions (2403.02241v3)

Published 4 Mar 2024 in cs.LG, cs.AI, and cs.CV

Abstract: Our understanding of the generalization capabilities of neural networks (NNs) is still incomplete. Prevailing explanations are based on implicit biases of gradient descent (GD) but they cannot account for the capabilities of models from gradient-free methods nor the simplicity bias recently observed in untrained networks. This paper seeks other sources of generalization in NNs. Findings. To understand the inductive biases provided by architectures independently from GD, we examine untrained, random-weight networks. Even simple MLPs show strong inductive biases: uniform sampling in weight space yields a very biased distribution of functions in terms of complexity. But unlike common wisdom, NNs do not have an inherent "simplicity bias". This property depends on components such as ReLUs, residual connections, and layer normalizations. Alternative architectures can be built with a bias for any level of complexity. Transformers also inherit all these properties from their building blocks. Implications. We provide a fresh explanation for the success of deep learning independent from gradient-based training. It points at promising avenues for controlling the solutions implemented by trained models.

References (90)

Citations (11)

View on Semantic Scholar

Summary

The paper reveals that random-weight neural networks exhibit non-random, architecture-dependent inductive biases in function complexity.
The paper employs Fourier, polynomial, and Lempel-Ziv measures to assess how activations, residuals, and normalization affect network bias.
The paper demonstrates that tailored architectural designs can modulate complexity biases, guiding effective deep learning model construction.

Examining the Inductive Biases of Neural Networks through the Lens of Random-Weight Functions

Introduction

The quest to understand the factors contributing to the generalization capabilities of neural networks (NNs) has led to a considerable body of research. Traditionally, much of this effort has been centered on examining the implicit biases of gradient descent as the primary mechanism of learning. However, recent studies challenge this view, suggesting that other factors intrinsic to the neural architectures might play a role in their ability to generalize from limited data. This paper contributes to this discussion by shifting the focus towards the inherent properties of neural network architectures, independent of the learning algorithm employed.

Inductive Biases in Random-Weight Networks

A pivotal part of our investigation involves the paper of neural networks initialized with random weights, henceforth referred to as random-weight networks. Contrary to the common intuition that these networks would exhibit behavior akin to random functions, our analyses reveal that even when uninitialized, neural networks exhibit strong inductive biases. These biases manifest as a tendency of the networks to represent functions of a certain level of complexity, which does not necessarily align with the notion of "simplicity bias" often attributed to neural networks. Our findings indicate that the complexity preference of neural networks is not a universal trait but is significantly influenced by architectural components such as activation functions, residual connections, and layer normalizations.

We employ a variety of complexity measures including Fourier decomposition, polynomial decomposition, and Lempel-Ziv (LZ) complexity to rigorously analyze the inductive biases of neural networks. Through this multi-faceted approach, we uncover that while networks with ReLU activations and those incorporating residual connections or layer normalization are inclined towards generating functions of lower complexity, the bias towards simplicity is not a foregone conclusion for all architectures.

Implications for Deep Learning

Our research provides fresh insights into the success of deep learning, suggesting that it is not solely reliant on gradient-based optimization methods. By elucidating how certain architectural choices predispose networks towards functions of a particular complexity, we unveil avenues for controlling the generalization behavior of trained models. This understanding underscores the importance of architectural design in deep learning and challenges the conventional wisdom surrounding the role of gradient descent in the generalization capabilities of neural networks.

Towards a Future of Tailored Complexity Bias

The notion that neural networks' parameter space is inherently biased towards functions of certain complexities opens up the potential for deliberate manipulation of these biases to suit specific tasks. By adjusting architectural elements such as activation functions and the magnitude of weights, we demonstrate that it's feasible to modulate the complexity bias of a network. This capability to tailor the inductive bias of neural networks could prove instrumental in tackling tasks where a mismatch exists between the complexity of the target function and the inherent bias of the network architecture.

Relevance to Transformer Models

In extending our analysis to transformer-based sequence models, we observe that transformers inherit the complexity biases of their constituent components. This realization not only reinforces the importance of architectural considerations in the design of neural models but also offers a fresh perspective on the observed tendencies of transformers, such as their predilection for generating simple, repetitive sequences.

Conclusion

In sum, this work takes significant strides in broadening our comprehension of the factors that drive the generalization abilities of neural networks. By focusing on the intrinsic biases of neural architectures, independent from the peculiarities of the optimization process, we provide a nuanced understanding of why certain architectural configurations excel in practice. The implications of our findings extend beyond theoretical interest, offering practical guidance for the design of neural networks tailored to the complexities of the tasks they are intended to solve.

PDF Markdown

Tweets

https://twitter.com/DamienTeney/status/1767606010695369060

https://twitter.com/DamienTeney/status/1898924989254140010

https://twitter.com/DamienTeney/status/1800799259878375774

https://twitter.com/Ethan_smith_20/status/1785719337178763647

https://twitter.com/DamienTeney/status/1790024587679248487

https://twitter.com/DamienTeney/status/1850621544231571835

HackerNews

Random Networks are not Random Functions (3 points, 0 comments)