Emergent Mind

Abstract

LLMs have demonstrated impressive performance in understanding language and executing complex reasoning tasks. However, LLMs with long context windows have been notorious for their expensive training costs and high inference latency. Even the most advanced models such as GPT-4 and Claude2 often make mistakes when processing inputs of over $100k$ tokens, a phenomenon also known as \textit{lost in the middle}. In this paper, we propose \textsc{LongAgent}, a method based on multi-agent collaboration, which scales LLMs (e.g., LLaMA) to a context of 128K and demonstrates potential superiority in long-text processing compared to GPT-4. In \textsc{LongAgent}, a leader is responsible for understanding user intent and directing team members to acquire information from documents. Due to members' hallucinations, it is non-trivial for a leader to obtain accurate information from the responses of dozens to hundreds of members. To address this, we develop an \textit{inter-member communication} mechanism to resolve response conflicts caused by hallucinations through information sharing. Our experimental results indicate that \textsc{LongAgent} offers a promising alternative for long-text processing. The agent team instantiated with LLaMA-7B achieves significant improvements in tasks such as 128k-long text retrieval, multi-hop question answering, compared to GPT-4.

A scheme showing how LongAgent collaborates by breaking down long texts for collective analysis and response.

Overview

  • Introduces LONG AGENT, a method using multi-agent collaboration for scaling LLMs to handle texts over 100,000 tokens effectively.

  • LONG AGENT features a leader and member agents to process segments of text in parallel, improving efficiency and handling of extended texts.

  • Experimental results show LONG AGENT outperforms leading models like GPT-4 in long-text comprehension tasks without increased computational demands.

  • Demonstrates scalability and efficiency with linear inference time growth, setting a foundation for further advancements in processing extensive documents.

Multi-Agent Collaboration Enhances LLMs for Long-Text Processing

Introduction

LLMs have made significant strides in natural language understanding and problem-solving. However, their ability to process long texts remains a considerable challenge, largely due to computational constraints and declining attention performance over extended sequences. This paper introduces LONG AGENT, a novel approach employing multi-agent collaboration to scale LLMs, enabling effective handling of documents exceeding 100,000 tokens.

Long Text Handling in LLMs

Traditionally, strategies to extend LLMs' context windows have revolved around improving positional encoding and designing mechanisms to manage longer inputs without a significant loss in long-term dependency tracking. Despite these advancements, models still struggle with processing large texts efficiently, a limitation LONG AGENT seeks to address through a collaborative agent-based framework.

LONG AGENT Architecture

LONG AGENT comprises a leader and multiple member agents, each responsible for analyzing segments of an input text and contributing to a collective understanding. The leader, understanding user intent, orchestrates the discussion among members to consolidate information and deduce answers to complex queries. This structure introduces an inter-member communication mechanism to resolve conflicting information, thus addressing the issue of hallucinations commonly faced by individual models when interpreting extensive data.

Implementation and Evaluation

The paper assesses LONG AGENT using both existing benchmarks like Needle in a Haystack PLUS and synthetic tasks designed to test models' long-text processing capabilities. Experimental results indicate that LONG AGENT, leveraging a 7B-parameter LLaMA for instantiation, outperforms established models including GPT-4 in tasks requiring comprehension over lengthy texts. This superiority is attributed to the ability to process segments in parallel, thus simplifying the task for each agent and allowing for efficient handling of larger contexts without increasing computational demands.

Efficiency and Scalability

A critical examination of LONG AGENT reveals its linear growth in inference time relative to text length, distinguishing it from full-attention mechanisms exhibiting quadratic complexity. This scalability and reduced memory footprint present a significant advantage in practical applications, where processing extensive documents is necessary.

Conclusion and Future Work

LONG AGENT marks a pivotal step towards leveraging multi-agent systems in enhancing the performance of LLMs in long-text processing. By distributing the cognitive load across multiple agents and utilizing a leader to synthesize their insights, LONG AGENT demonstrates notable improvements over traditional models in handling extensive narratives. Future research directions include optimizing the leader's decision-making process and expanding the architecture to encompass a wider variety of tasks, potentially extending LLM applications in areas previously constrained by processing capabilities.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.