Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

37 tokens/sec

GPT-4o

11 tokens/sec

Gemini 2.5 Pro Pro

37 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

10 tokens/sec

DeepSeek R1 via Azure Pro

33 tokens/sec

2000 character limit reached

HGRN2: Gated Linear RNNs with State Expansion (2404.07904v2)

Published 11 Apr 2024 in cs.CL

Abstract: Hierarchically gated linear RNN (HGRN, \citealt{HGRN}) has demonstrated competitive training speed and performance in LLMing while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, limiting its expressiveness. To address this issue, we introduce a simple outer product-based state expansion mechanism, which significantly enlarges the recurrent state size without introducing any additional parameters. This enhancement also provides a linear attention interpretation for HGRN2, enabling hardware-efficient training. Our extensive experiments verify the advantage of HGRN2 over HGRN consistently across different settings and competitive with other recurrent models.

References (61)

Citations (21)

View on Semantic Scholar

Summary

The paper introduces an outer-product-based state expansion that increases RNN capacity without extra parameters, significantly enhancing model expressiveness.
It employs a hardware-efficient training algorithm inspired by linear attention for rapid inference and scalable performance.
Extensive evaluations demonstrate that HGRN2 outperforms its predecessor and competes with leading models in language and image tasks.

Enhancing Linear RNNs with State Expansion: The Introduction of HGRN2

Introduction to HGRN2

The Hierarchically Gated Linear RNN (HGRN) architecture has previously shown promise in LLMing and efficient inference through its use of recurrent neural networks (RNNs) with linear inference complexity. However, its performance has been somewhat constrained by its relatively small recurrent state size. In a recent development, researchers have proposed HGRN2, an advancement over HGRN, which significantly increases the recurrent state size without adding extra parameters. This is achieved through an innovative outer-product-based state expansion mechanism inspired by linear attention models, enhancing both the model's expressiveness and efficiency. HGRN2 exhibits impressive performance improvements over its predecessor across several benchmarks, including LLMing, image classification, and the Long Range Arena.

Motivation and Background

The fundamental challenge addressed by HGRN2 pertains to the limitations of fixed-sized recurrent states in RNNs. To enhance the utility of these states, two main strategies are essential: utilizing data-dependent decays for selective information retention and increasing the recurrent state size. While HGRN made strides in employing data-dependent decays, its fixed state size limited performance scalability. State expansion emerges as a critical technique in overcoming this barrier, as demonstrated by several contemporary models like Mamba and LLaMa. HGRN2 builds upon these insights, focusing on state expansion to elevate model performance without compromising efficiency.

HGRN2: Key Innovations

HGRN2 introduces several significant improvements over HGRN1, detailed as follows:

State Expansion Through Outer Products: HGRN2 leverages a nonparametric outer-product-based mechanism to expand the recurrent state size effectively. This approach facilitates a substantial increase in state size without the need for additional parameters, thus maintaining parameter efficiency.
Efficient Training and Inference: Inspired by the linear attention form, HGRN2 adopts a hardware-efficient training algorithm that allows for accelerated computation without compromising model scalability or performance.
Robust Empirical Evaluation: Through extensive experiments across various benchmarks, HGRN2 not only outperforms HGRN1 but also achieves competitive results against state-of-the-art models, including Mamba and LLaMa architectures in LLMing.
Scalability and Efficiency: One of the standout features of HGRN2 is its ability to scale efficiently, as demonstrated in controlled experiments on large-scale settings. With its design, HGRN2 exhibits potential for further scalability and utility in more demanding applications.

Practical Implications and Theoretical Contributions

HGRN2’s introduction of state expansion via a simple outer product represents a nuanced shift in enhancing RNNs' capacity for LLMing and beyond. This approach underscores the untapped potential of linear RNN architectures in achieving high performance with computational efficiency. The practical implications of HGRN2 are profound, especially in applications where inference speed and model scalability are critical. Moreover, the theoretical underpinnings of HGRN2 offer fresh perspectives on harnessing the power of RNNs through methodical state expansion, setting a new benchmark for subsequent research in this domain.

Conclusion and Future Directions

HGRN2 marks a significant step forward in the evolution of RNNs, balancing the dual objectives of enhancing model expressiveness while maintaining efficiency. By addressing the limitations of its predecessor through state expansion, HGRN2 paves the way for more sophisticated and scalable RNN architectures. Future research will likely explore further optimizations in state expansion techniques and apply HGRN2’s principles to a broader range of applications, from natural language processing to complex multimodal tasks, opening up new frontiers in the field of generative AI.

PDF Markdown

Tweets

https://twitter.com/jeethu/status/1779780326979457153

https://twitter.com/realmofresearch/status/1779868381581365641

https://twitter.com/knishimae0531/status/1778711027938639881