LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration (2402.11550v2)

Published 18 Feb 2024 in cs.CL and cs.AI

Abstract: LLMs have demonstrated impressive performance in understanding language and executing complex reasoning tasks. However, LLMs with long context windows have been notorious for their expensive training costs and high inference latency. Even the most advanced models such as GPT-4 and Claude2 often make mistakes when processing inputs of over $100k$ tokens, a phenomenon also known as \textit{lost in the middle}. In this paper, we propose \textsc{LongAgent}, a method based on multi-agent collaboration, which scales LLMs (e.g., LLaMA) to a context of 128K and demonstrates potential superiority in long-text processing compared to GPT-4. In \textsc{LongAgent}, a leader is responsible for understanding user intent and directing team members to acquire information from documents. Due to members' hallucinations, it is non-trivial for a leader to obtain accurate information from the responses of dozens to hundreds of members. To address this, we develop an \textit{inter-member communication} mechanism to resolve response conflicts caused by hallucinations through information sharing. Our experimental results indicate that \textsc{LongAgent} offers a promising alternative for long-text processing. The agent team instantiated with LLaMA-7B achieves significant improvements in tasks such as 128k-long text retrieval, multi-hop question answering, compared to GPT-4.

Citations (10)

View on Semantic Scholar

Summary

The paper introduces a multi-agent framework, LongAgent, that scales LLMs to process over 128k tokens by dividing long texts into manageable segments.
It employs a leader and member agents to orchestrate parallel processing and resolve conflicting information, reducing hallucinations common in single models.
Experimental results demonstrate linear inference time growth and a reduced memory footprint, enabling efficient handling of extensive documents compared to GPT-4.

Multi-Agent Collaboration Enhances LLMs for Long-Text Processing

Introduction

LLMs have made significant strides in natural language understanding and problem-solving. However, their ability to process long texts remains a considerable challenge, largely due to computational constraints and declining attention performance over extended sequences. This paper introduces LONG AGENT, a novel approach employing multi-agent collaboration to scale LLMs, enabling effective handling of documents exceeding 100,000 tokens.

Long Text Handling in LLMs

Traditionally, strategies to extend LLMs' context windows have revolved around improving positional encoding and designing mechanisms to manage longer inputs without a significant loss in long-term dependency tracking. Despite these advancements, models still struggle with processing large texts efficiently, a limitation LONG AGENT seeks to address through a collaborative agent-based framework.

LONG AGENT Architecture

LONG AGENT comprises a leader and multiple member agents, each responsible for analyzing segments of an input text and contributing to a collective understanding. The leader, understanding user intent, orchestrates the discussion among members to consolidate information and deduce answers to complex queries. This structure introduces an inter-member communication mechanism to resolve conflicting information, thus addressing the issue of hallucinations commonly faced by individual models when interpreting extensive data.

Implementation and Evaluation

The paper assesses LONG AGENT using both existing benchmarks like Needle in a Haystack PLUS and synthetic tasks designed to test models' long-text processing capabilities. Experimental results indicate that LONG AGENT, leveraging a 7B-parameter LLaMA for instantiation, outperforms established models including GPT-4 in tasks requiring comprehension over lengthy texts. This superiority is attributed to the ability to process segments in parallel, thus simplifying the task for each agent and allowing for efficient handling of larger contexts without increasing computational demands.

Efficiency and Scalability

A critical examination of LONG AGENT reveals its linear growth in inference time relative to text length, distinguishing it from full-attention mechanisms exhibiting quadratic complexity. This scalability and reduced memory footprint present a significant advantage in practical applications, where processing extensive documents is necessary.

Conclusion and Future Work

LONG AGENT marks a pivotal step towards leveraging multi-agent systems in enhancing the performance of LLMs in long-text processing. By distributing the cognitive load across multiple agents and utilizing a leader to synthesize their insights, LONG AGENT demonstrates notable improvements over traditional models in handling extensive narratives. Future research directions include optimizing the leader's decision-making process and expanding the architecture to encompass a wider variety of tasks, potentially extending LLM applications in areas previously constrained by processing capabilities.

PDF Markdown

Related Papers

Tweets

https://twitter.com/IntuitMachine/status/1804161304711569477

https://twitter.com/_akhaliq/status/1759795816477147345

https://twitter.com/KyeGomezB/status/1803539228237959493

https://twitter.com/KyeGomezB/status/1802514974163562624

https://twitter.com/Adhiguna_AIaaS/status/1767841056786403716

https://twitter.com/javaeeeee1/status/1761360649651646507