Long-Context Language Modeling with Parallel Context Encoding (2402.16617v2)
Abstract: Extending LLMs to process longer inputs is crucial for a wide range of applications. However, the substantial computational cost of transformers and limited generalization of positional encoding restrict the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE employs a small encoder to process long inputs chunk by chunk, enabling the frozen decoder to utilize additional contexts via cross-attention. CEPE is efficient, generalizable, and versatile: trained with 8K-token documents, it extends the context window of LLAMA-2 to 128K tokens, offering 10x the throughput with only 1/6 of the memory. CEPE yields strong performance on LLMing and in-context learning. CEPE also excels in retrieval-augmented applications, while existing long-context models degenerate with retrieved contexts. We further introduce a CEPE variant that can extend the context window of instruction-tuned models using only unlabeled data, and showcase its effectiveness on LLAMA-2-CHAT, leading to a strong instruction-following model that can leverage very long contexts on downstream tasks.
- A general language assistant as a laboratory for alignment.
- Proofpile: A pre-training dataset of mathematical texts.
- Training a helpful and harmless assistant with reinforcement learning from human feedback.
- Unlimiformer: Long-range transformers with unlimited length input. In Advances in Neural Information Processing Systems (NeurIPS).
- Improving language models by retrieving from trillions of tokens. In International Conference on Machine Learning (ICML), volume 162, pages 2206–2240.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS).
- Efficient intent detection with dual sentence encoders.
- SummScreen: A dataset for abstractive screenplay summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8602–8615, Dublin, Ireland. Association for Computational Linguistics.
- Extending context window of large language models via positional interpolation.
- Longlora: Efficient fine-tuning of long-context large language models.
- Adapting language models to compress contexts. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3829–3846, Singapore. Association for Computational Linguistics.
- Rethinking attention with performers. In International Conference on Learning Representations.
- A dataset of information-seeking questions and answers anchored in research papers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4599–4610, Online. Association for Computational Linguistics.
- BERT: Pre-training of deep bidirectional Transformers for language understanding. In North American Chapter of the Association for Computational Linguistics (NAACL).
- Data engineering for scaling language models to 128k context.
- Enabling large language models to generate text with citations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6465–6488, Singapore. Association for Computational Linguistics.
- Albert Gu and Tri Dao. 2023. Mamba: Linear-time sequence modeling with selective state spaces.
- Efficiently modeling long sequences with structured state spaces. In International Conference on Learning Representations.
- REALM: Retrieval-augmented language model pre-training. In International Conference on Machine Learning (ICML).
- Prototypical calibration for few-shot learning of language models. In The Eleventh International Conference on Learning Representations.
- The curious case of neural text degeneration. In International Conference on Learning Representations (ICLR).
- Surface form competition: Why the highest probability answer isn’t always right. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7038–7051, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Efficient attentions for long document summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1419–1436, Online. Association for Computational Linguistics.
- Efficient Long-Text Understanding with Short-Text Models. Transactions of the Association for Computational Linguistics, 11:284–299.
- Camels in a changing climate: Enhancing lm adaptation with tulu 2.
- Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research.
- Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 874–880, Online. Association for Computational Linguistics.
- Atlas: Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299.
- Repeat after me: Transformers are better than state space models at copying.
- TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada. Association for Computational Linguistics.
- Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
- The NarrativeQA reading comprehension challenge. Transactions of the Association for Computational Linguistics, 6:317–328.
- BOOKSUM: A collection of datasets for long-form narrative summarization. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6536–6558, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
- An evaluation dataset for intent classification and out-of-scope prediction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1311–1316, Hong Kong, China. Association for Computational Linguistics.
- Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086–6096, Florence, Italy. Association for Computational Linguistics (ACL).
- Ra-dit: Retrieval-augmented dual instruction tuning.
- Benchmarking natural language understanding services for building conversational agents. ArXiv, abs/1903.05566.
- RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, Dublin, Ireland. Association for Computational Linguistics.
- When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Association for Computational Linguistics (ACL), pages 9802–9822, Toronto, Canada. Association for Computational Linguistics.
- Nonparametric masked language modeling. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2097–2118, Toronto, Canada. Association for Computational Linguistics.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems (NeurIPS), 35:27730–27744.
- Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 115–124, Ann Arbor, Michigan. Association for Computational Linguistics.
- QuALITY: Question answering with long input texts, yes! In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5336–5358, Seattle, United States. Association for Computational Linguistics.
- Yarn: Efficient context window extension of large language models.
- Train short, test long: Attention with linear biases enables input length extrapolation. In International Conference on Learning Representations (ICLR).
- Compressive transformers for long-range sequence modelling. In International Conference on Learning Representations.
- Exploring the limits of transfer learning with a unified text-to-text Transformer. The Journal of Machine Learning Research (JMLR), 21(140).
- Parallel context windows for large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6383–6402, Toronto, Canada. Association for Computational Linguistics.
- Ohad Rubin and Jonathan Berant. 2023. Long-range language modeling with self-retrieval.
- ZeroSCROLLS: A zero-shot benchmark for long text understanding. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7977–7989, Singapore. Association for Computational Linguistics.
- SCROLLS: Standardized CompaRison over long language sequences. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 12007–12021, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning (ICML).
- Replug: Retrieval-augmented black-box language models.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Empirical Methods in Natural Language Processing (EMNLP).
- Roformer: Enhanced transformer with rotary position embedding.
- Stanford Alpaca: An Instruction-following LLaMA model.
- Together. 2023a. Llama-2-7b-32k.
- Together. 2023b. Redpajama: An open source recipe to reproduce llama training dataset.
- LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models.
- Attention is all you need. Advances in Neural Information Processing Systems (NIPS), 30.
- Ellen M. Voorhees and Dawn M. Tice. 2000. Building a question answering test collection. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’00, page 200–207, New York, NY, USA. Association for Computing Machinery.
- Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13484–13508, Toronto, Canada. Association for Computational Linguistics.
- A dataset of python files from github. https://github.com/huggingface/blog/blob/main/codeparrot.md?version=codeparrot/codeparrot-valid-v2-near-dedup.
- Transformers: State-of-the-art natural language processing. In Empirical Methods in Natural Language Processing (EMNLP): System Demonstrations.
- Efficient streaming language models with attention sinks.
- Effective long-context scaling of foundation models.
- Character-level convolutional networks for text classification. In Neural Information Processing Systems.
- Calibrate before use: Improving few-shot performance of language models. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12697–12706. PMLR.
- QMSum: A new benchmark for query-based multi-domain meeting summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5905–5921, Online. Association for Computational Linguistics.