LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning (2401.01325v3)

Published 2 Jan 2024 in cs.CL, cs.AI, and cs.LG

Abstract: It is well known that LLMs cannot generalize well to long contexts whose lengths are larger than the training sequence length. This poses challenges when employing LLMs for processing long input sequences during inference. In this work, we argue that LLMs themselves have inherent capabilities to handle long contexts without fine-tuning. To achieve this goal, we propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information: the grouped attention and the neighbor attention. The grouped attention captures the dependencies among tokens that are far apart, while neighbor attention captures dependencies among adjacent tokens within a specified range. The two-level attentions are computed based on the original model's self-attention mechanism during inference. With minor code modification, our SelfExtend can effortlessly extend existing LLMs' context window without any fine-tuning. We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length. The code can be found at \url{https://github.com/datamllab/LongLM}.

References (42)

Citations (68)

View on Semantic Scholar

Summary

The paper introduces Self-Extend, a novel method for extending LLM context windows without additional fine-tuning using grouped attention.
The methodology employs a mapping technique to realign relative positions with bi-level attention, handling near tokens normally and distant tokens in groups.
Experimental results demonstrate that Self-Extend keeps perplexity low and outperforms fine-tuning methods on long-sequence tasks.

Overview

This paper introduces a novel approach, termed Self-Extend, for empowering LLMs to process significantly longer text sequences beyond their original training limitations. The method enables LLMs to recognize and utilize longer contexts without the need for any fine-tuning or additional training. The fundamental premise is that LLMs, akin to humans who can comprehend lengthy texts without being exclusively trained on them, possess an intrinsic capability to comprehend long contexts that has not been fully leveraged.

Methodology

The challenge in handling longer texts by LLMs is often attributed to the relative positional encoding becoming out-of-distribution (O.O.D) when the model encounters sequence lengths beyond its pretraining context window. To address this, Self-Extend proposes a mapping technique that realigns unseen relative positions during inference back to positions encountered during training, effectively mimicking the way humans approximate the relative significance of distant text parts.

This is achieved through what the authors call "grouped attention", which applies a floor operation to partition the text into smaller clusters, allowing preservation of the order of information while reducing the granularity of positional data. Bi-level attention information, with a regular self-attention process for nearby token pairs and grouped attention for distant pairs, allows Self-Extend to precisely model near tokens while also maintaining coherence over the entire text. The method is simple to implement, requiring minimal modification to existing model code.

Experimental Results

Self-Extend's ability to extend context windows is validated through various settings, demonstrating its ability to keep perplexity low and help LLMs maintain performance on long-sequence inputs. The approach is tested against other context window extension methods, specifically on tasks requiring long-sequence understanding such as LLMing, synthetic long context tasks, and real-world long context tasks. Remarkably, Self-Extend often outperforms fine-tuning-based methods despite its non-learning nature and modest implementation requirements.

Implications and Conclusion

The paper concludes by emphasizing the potency of LLMs in handling longer contexts as evidenced by Self-Extend's performance. It highlights the potential cost savings and efficiency gains as Self-Extend requires no additional model training or fine-tuning. As a future direction, the researchers aim to enhance efficiency with Flash Attention implementation and consider advanced mapping strategies to further improve the capacity for extended contexts. They also recognize the current limitations regarding the finite extension of context windows and the necessity for consensus on evaluating long context tasks. Overall, Self-Extend offers a promising step toward fully unlocking the long-context processing abilities of LLMs.

PDF Markdown

Related Papers

GitHub

GitHub - datamllab/LongLM: [ICML'24] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning (528 stars)

Tweets

https://twitter.com/794433401591693312/status/1742367971857883383

https://twitter.com/sdand/status/1743695855545426362

https://twitter.com/cwolferesearch/status/1748393116338409890

https://twitter.com/cwolferesearch/status/1786025095959298234

https://twitter.com/martin_gorner/status/1747234111540842556

https://twitter.com/2465283662/status/1742371015362052461

YouTube

Show All Videos