Emergent Mind

MLPs Learn In-Context

(2405.15618)
Published May 24, 2024 in cs.LG and cs.NE

Abstract

In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, has commonly been assumed to be a unique hallmark of Transformer models. In this study, we demonstrate that multi-layer perceptrons (MLPs) can also learn in-context. Moreover, we find that MLPs, and the closely related MLP-Mixer models, learn in-context competitively with Transformers given the same compute budget. We further show that MLPs outperform Transformers on a subset of ICL tasks designed to test relational reasoning. These results suggest that in-context learning is not exclusive to Transformers and highlight the potential of exploring this phenomenon beyond attention-based architectures. In addition, MLPs' surprising success on relational tasks challenges prior assumptions about simple connectionist models. Altogether, our results endorse the broad trend that ``less inductive bias is better" and contribute to the growing interest in all-MLP alternatives to task-specific architectures.

ICL regression and classification, compute vs. MSE, and MSE variations with context length and data diversity.

Overview

  • The paper demonstrates that multi-layer perceptrons (MLPs) can effectively perform in-context learning (ICL), challenging the notion that this capability is exclusive to Transformer models.

  • Experiments revealed that MLPs often outperform Transformers in relational reasoning tasks, suggesting that weaker inductive biases can lead to better performance with increased data and compute resources.

  • The study supports the heuristic that less inductive bias may be more advantageous, encouraging future research into simpler neural network architectures like MLPs for complex cognitive tasks.

In-context Learning Beyond Transformers: An Evaluation of Multi-Layer Perceptrons

The paper presents an in-depth investigation into the capabilities of multi-layer perceptrons (MLPs) concerning in-context learning (ICL), a task paradigm traditionally considered a hallmark of Transformer models. The findings challenge the common belief that ICL competencies are exclusive to attention-based architectures. MLPs, as well as MLP-Mixer models, exhibit competitive in-context learning abilities given the same compute budget as Transformers. Notably, MLPs even outperform Transformers on a subset of tasks designed to test relational reasoning.

Key Contributions

  1. Demonstration of In-context Learning in MLPs: The authors successfully show that MLPs can perform in-context learning similarly to Transformers, suggesting that the ability is not unique to attention-based models. This finding aligns with the universal approximation capability of MLPs, now extended to in-context scenarios.
  2. Superior Relational Reasoning: MLPs outperform Transformers on relational reasoning tasks, challenging the narrative that more sophisticated architectures with stronger inductive biases are always better suited for complex cognitive tasks.
  3. Less Inductive Bias is Better: The study underscores the concept that models with weaker inductive biases, such as MLPs, can outperform those with stronger biases as data and compute resources grow. This observation supports the broader "bitter lesson" heuristic which posits that general methods tend to win out as compute increases.

Experiments and Results

The authors conduct a series of controlled experiments to test the ICL capabilities of MLPs and Transformers on tasks traditionally seen as benchmarks for ICL.

In-context Regression and Classification

  1. ICL Regression: MLPs and MLP-Mixer models achieve near-optimal mean squared error (MSE) comparable to Transformers on a series of ICL regression tasks. Although MLPs show deterioration with an increasing number of context points, the MLP-Mixer remains robust, highlighting the potential for architectures derived from MLPs.
  2. ICL Classification: In classification tasks, MLPs and Transformers both transition from in-weight learning (IWL) to ICL as data diversity increases. MLPs display competitive performance with Transformers, efficiently handling different lengths of context exemplars.

Relational Tasks

The paper explores relational reasoning, an advanced subset of ICL classification tasks used to probe higher-order cognitive processing. In these tasks, MLPs not only match but often outperform Transformers.

  1. Match-to-Sample: MLPs achieve lower computational loss than Transformers, even demonstrating robust performance under out-of-distribution conditions.
  2. Sphere and Line Oddball Tasks: On tasks requiring relational reasoning, MLPs excel, generalizing better in out-of-distribution tests than Transformers. Specific architectural modifications, like relationally bottlenecked MLPs, further improve performance, but only when relations align well with task structure.

Discussion and Implications

The findings provide compelling evidence that ICL and relational reasoning can be efficiently performed by MLP architectures. This challenges existing assumptions about the necessity of attention mechanisms for such tasks. The demonstrated capabilities of MLPs suggest potential practical advantages, encouraging further exploration into their utility over more complex, inductively biased models like Transformers.

The study aligns with the heuristic that "less inductive bias is better," especially as compute and data continue to grow. Future research should examine MLPs' performance on more complex datasets and under data-limited conditions to understand the scalability and limitations of these findings.

Conclusion

This paper contributes significantly to the understanding of in-context learning and relational reasoning by simple neural networks. The results promote a broader perspective for exploring alternative architectures to Transformers for ICL tasks. By illustrating that MLPs can indeed learn in-context and perform sophisticated relational reasoning, the study opens new avenues for further research into efficient and generalizable AI models.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.