InternLM2 Technical Report (2403.17297v1)

Published 26 Mar 2024 in cs.CL and cs.AI

Abstract: The evolution of LLMs like ChatGPT and GPT-4 has sparked discussions on the advent of AGI. However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution.

Authors (100)

Zheng Cai (157 papers)
Maosong Cao (9 papers)
Haojiong Chen (1 paper)
Kai Chen (512 papers)
Keyu Chen (76 papers)
Xin Chen (457 papers)
Xun Chen (166 papers)
Zehui Chen (41 papers)
Zhi Chen (235 papers)
Pei Chu (8 papers)
Xiaoyi Dong (73 papers)
Haodong Duan (55 papers)
Qi Fan (30 papers)
Zhaoye Fei (15 papers)
Yang Gao (761 papers)
Jiaye Ge (4 papers)
Chenya Gu (3 papers)
Yuzhe Gu (10 papers)
Tao Gui (127 papers)
Aijia Guo (1 paper)

Citations (121)

View on Semantic Scholar

Summary

The paper presents a novel pre-training approach using diverse text, code, and long-context data coupled with innovative techniques like Group Query Attention.
It leverages a rigorous multi-phase pre-training process ensuring quality data through standardization, deduplication, and safety filtering for enhanced accuracy.
The evaluation demonstrates significant improvements in coding, reasoning, and long-context processing, underscoring the model's robust, real-world applicability.

InternLM2: A Comprehensive Overview on Pre-Training and Alignment Strategies

Pre-training Process and Data Preparation

The development of InternLM2, an open-source LLM, involves meticulous pre-training on a diverse mixture of text, code, and long-context data. The pre-training corpus is significant, amassing trillions of tokens from various sources including web pages, academic papers, and publicly available text resources. Special attention is paid to the quality of pre-training data, ensuring that it is high-quality, relevant, and encompasses a wide knowledge base.

Text Data

Text data is collected from multiple sources and rigorously processed through steps including standardization, deduplication, and safety filtering to ensure not only the diversity but also the safety and quality of the pre-training corpus.

Code Data

Given the increasing importance of programming and coding skills in LLMs, InternLM2's pre-training data notably includes a significant amount of code data. This corpus is carefully curated to cover a wide range of programming languages and domains, enhancing the model's coding capabilities.

Long Context Data

InternLM2 stands out for its effective incorporation of long-context data during pre-training. This innovative step enables the model to efficiently handle long-context scenarios, vastly expanding its application potentials. The long-context data preparation involves additional filtering and quality checks to ensure its relevance and utility in training.

Innovative Pre-training and Optimization Techniques

InternLM2 pre-training consists of three distinct phases, focusing on models that efficiently capture long-term dependencies. The innovative Group Query Attention (GQA) mechanism is introduced to decrease memory requirements during inference, making long-sequence processing more feasible.

Conditional Online RLHF and Alignment Strategies

InternLM2's alignment phase employs a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy. This involves the use of a conditional reward model to harmonize conflicting human preferences and multi-round RLHF to address emergent reward hacking behaviors. The conditional reward model dynamically adjusts its priorities based on the given conditions, maintaining consistent performance across varied tasks.

Comprehensive Evaluation and Analysis

InternLM2 is evaluated across several benchmarks covering a wide array of tasks and capabilities including comprehensive examinations, knowledge tasks, coding problems, reasoning, mathematics, and long-context modeling. InternLM2 exhibits strong numerical results and significant improvements in performance post alignment training, demonstrating its effectiveness in aligning with human preferences and extending its utility in real-world applications.

InternLM2's performance in coding tasks, specifically in Python and multiple programming languages, showcases its robust coding capabilities. Similarly, in long-context modeling tasks, InternLM2 demonstrates exceptional performance, marking it as a versatile model capable of handling intricate tasks requiring extensive contextual understanding.

Implications and Future Developments

InternLM2's comprehensive development strategy, focusing on diverse pre-training data, innovative optimization techniques, and strategic alignment training, outlines a promising approach to advancing LLM capabilities. The release of pre-training checkpoints offers the community valuable insights into the evolution of LLMs. Looking ahead, the continual refinement of alignment strategies and expansion of pre-training data can further enhance LLMs' effectiveness, broadening their applicability across numerous domains.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_akhaliq/status/1772833121609679216

https://twitter.com/arankomatsuzaki/status/1772816281785217087

https://twitter.com/intern_lm/status/1777897683124191306

https://twitter.com/fly51fly/status/1773104053150724241

https://twitter.com/Xianbao_QIAN/status/1775691769407222108

https://twitter.com/dippatel1994/status/1773159969661083879

YouTube

Show All Videos

HackerNews

InternLM2 (135 points, 23 comments)

Reddit

InternLM2 (0 points, 1 comment)