Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models (2409.00509v2)

Published 31 Aug 2024 in cs.CL

Abstract: LLMs face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training strategy for extending the context window of LLMs, including impactful token analysis, position index transformation, and training optimization strategies. It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies. Experiments on three types of LLMs show that LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training. Furthermore, LongRecipe also preserves the original LLM's capabilities in general tasks. Ultimately, we can extend the effective context window of open-source LLMs from 8k to 128k, achieving performance close to GPT-4 with just one day of dedicated training using a single GPU with 80G memory. Our code is released at https://github.com/zhiyuanhubj/LongRecipe.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Zhiyuan Hu (30 papers)
  2. Yuliang Liu (82 papers)
  3. Jinman Zhao (20 papers)
  4. Suyuchen Wang (16 papers)
  5. Yan Wang (734 papers)
  6. Wei Shen (181 papers)
  7. Qing Gu (44 papers)
  8. Anh Tuan Luu (69 papers)
  9. See-Kiong Ng (103 papers)
  10. Zhiwei Jiang (24 papers)
  11. Bryan Hooi (159 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com