Emergent Mind

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

(2401.01325)
Published Jan 2, 2024 in cs.CL , cs.AI , and cs.LG

Abstract

It is well known that LLMs cannot generalize well to long contexts whose lengths are larger than the training sequence length. This poses challenges when employing LLMs for processing long input sequences during inference. In this work, we argue that LLMs themselves have inherent capabilities to handle long contexts without fine-tuning. To achieve this goal, we propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information: the grouped attention and the neighbor attention. The grouped attention captures the dependencies among tokens that are far apart, while neighbor attention captures dependencies among adjacent tokens within a specified range. The two-level attentions are computed based on the original model's self-attention mechanism during inference. With minor code modification, our SelfExtend can effortlessly extend existing LLMs' context window without any fine-tuning. We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length. The code can be found at \url{https://github.com/datamllab/LongLM}.

SelfExtend's attention mechanism merges two parts for tokens inside and outside neighbor window.

Overview

  • The paper introduces a novel 'Self-Extend' technique enabling LLMs to process longer text sequences without fine-tuning or additional training.

  • Self-Extend uses a mapping technique to realign positions for unseen longer sequences, using 'grouped attention' to manage positional data and maintain order.

  • Through experimental results, Self-Extend is shown to maintain low perplexity and outperform other methods on long-sequence tasks without further training.

  • The approach promises cost savings and efficiency, as no additional training is required, and foreshadows potential enhancements like Flash Attention implementation.

  • The paper recognizes limitations such as the finite extension of context windows and the need for standard evaluations for long context tasks.

Overview

This paper introduces a novel approach, termed Self-Extend, for empowering LLMs to process significantly longer text sequences beyond their original training limitations. The method enables LLMs to recognize and utilize longer contexts without the need for any fine-tuning or additional training. The fundamental premise is that LLMs, akin to humans who can comprehend lengthy texts without being exclusively trained on them, possess an intrinsic capability to comprehend long contexts that has not been fully leveraged.

Methodology

The challenge in handling longer texts by LLMs is often attributed to the relative positional encoding becoming out-of-distribution (O.O.D) when the model encounters sequence lengths beyond its pretraining context window. To address this, Self-Extend proposes a mapping technique that realigns unseen relative positions during inference back to positions encountered during training, effectively mimicking the way humans approximate the relative significance of distant text parts.

This is achieved through what the authors call "grouped attention", which applies a floor operation to partition the text into smaller clusters, allowing preservation of the order of information while reducing the granularity of positional data. Bi-level attention information, with a regular self-attention process for nearby token pairs and grouped attention for distant pairs, allows Self-Extend to precisely model near tokens while also maintaining coherence over the entire text. The method is simple to implement, requiring minimal modification to existing model code.

Experimental Results

Self-Extend's ability to extend context windows is validated through various settings, demonstrating its ability to keep perplexity low and help LLMs maintain performance on long-sequence inputs. The approach is tested against other context window extension methods, specifically on tasks requiring long-sequence understanding such as language modeling, synthetic long context tasks, and real-world long context tasks. Remarkably, Self-Extend often outperforms fine-tuning-based methods despite its non-learning nature and modest implementation requirements.

Implications and Conclusion

The paper concludes by emphasizing the potency of LLMs in handling longer contexts as evidenced by Self-Extend's performance. It highlights the potential cost savings and efficiency gains as Self-Extend requires no additional model training or fine-tuning. As a future direction, the researchers aim to enhance efficiency with Flash Attention implementation and consider advanced mapping strategies to further improve the capacity for extended contexts. They also recognize the current limitations regarding the finite extension of context windows and the necessity for consensus on evaluating long context tasks. Overall, Self-Extend offers a promising step toward fully unlocking the long-context processing abilities of LLMs.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube