Emergent Mind

LLaMA Pro: Progressive LLaMA with Block Expansion

(2401.02415)
Published Jan 4, 2024 in cs.CL

Abstract

Humans generally acquire new skills without compromising the old; however, the opposite holds for LLMs, e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.

LLaMA Pro outperforms LLaMA2-7B across tasks after fine-tuning with the same instruction dataset.

Overview

  • The paper introduces a method called block expansion for LLMs to mitigate catastrophic forgetting and retain the model's original capabilities while gaining new domain-specific skills.

  • A new model, named LLaMA Pro-8.3B, was created by expanding Transformer blocks in an existing pretrained model, LLaMA2-7B, using specialized datasets focused on programming and mathematics.

  • LLaMA Pro includes a variant, LLaMA Pro - Instruct, fine-tuned with 'instruction following' to better comprehend and follow user commands.

  • The performance of LLaMA Pro is evaluated on diverse benchmarks and real-world scenarios, demonstrating state-of-the-art results and significant improvement over other models in the LLaMA series.

  • The study's promising results indicate potential for future research, especially for applying block expansion to multimodal applications and balancing domain learning with general model competencies.

Introduction to LLaMA Pro

The development of LLMs has been marked by increasingly impressive performances across a range of tasks, yet they face challenges in acquiring new domain-specific skills without losing their existing, generalized abilities. In academia, this phenomenon is recognized as catastrophic forgetting, and it is a significant barrier when fine-tuning LLMs for tasks in domains such as programming and mathematics. The paper introduces a method called block expansion aimed at preserving and augmenting the capabilities of LLMs. The technique involves the expansion of Transformer blocks—an essential LLM component—while retaining the existing knowledge base. The resulting model, LLaMA Pro-8.3B, demonstrates its prowess across varied benchmarks when compared with other models of the LLaMA series.

Methodology

Block expansion operates during the post-pretraining phase and works by adding copied Transformer blocks, which start as identity blocks, to an existing pretrained LLM. The original LLaMA2-7B model is selected for this process. The researchers meticulously tune the newly added blocks using a specialized corpus while keeping the inherited blocks unchanged, ensuring the preservation of the model's original capabilities. To materialize LLaMA Pro, they pre-train the expanded blocks on datasets that concentrate on code and mathematical content. Additionally, the method introduces LLaMA Pro - Instruct, a variant of the model that undergoes 'instruction following' fine-tuning to enhance its capability to understand and execute user instructions.

Performance and Evaluation

LLaMA Pro’s performance is rigorously evaluated on a variety of tasks, comparing favorably against other models in the family and achieving state-of-the-art results. This is particularly evident in programming-related benchmarks like HumanEval and math-focused tasks such as GSM8K. The model is also subjected to real-world scenarios, including tool usage and response to human feedback. Furthermore, LLaMA Pro is compared with other LLMs using a specialized LLM evaluation framework, confirming its superior overall performance and adaptability.

Conclusion and Future Directions

The comprehensive results of the study underline the effectiveness of the block expansion post-pretraining method in enhancing the skillset of LLMs without the adverse effects of catastrophic forgetting. With LLaMA Pro, we witness a model that excels in both general linguistic tasks and highly specialized domains such as programming. The research opens avenues for future explorations on how to adapt this method to other areas, including multimodal applications, and underscores the significance of harmonizing domain-specific learning with the retention of general competencies in LLMs.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube
HackerNews
Reddit
LLaMA Pro: Progressive LLaMA with Block Expansion (Unreleased) (68 points, 26 comments) in /r/LocalLLaMA