Emergent Mind

Knowledge Fusion of Large Language Models

(2401.10491)
Published Jan 19, 2024 in cs.CL

Abstract

While training LLMs from scratch can generate models with distinct functionalities and strengths, it comes at significant costs and may result in redundant capabilities. Alternatively, a cost-effective and compelling approach is to merge existing pre-trained LLMs into a more potent model. However, due to the varying architectures of these LLMs, directly blending their weights is impractical. In this paper, we introduce the notion of knowledge fusion for LLMs, aimed at combining the capabilities of existing LLMs and transferring them into a single LLM. By leveraging the generative distributions of source LLMs, we externalize their collective knowledge and unique strengths, thereby potentially elevating the capabilities of the target model beyond those of any individual source LLM. We validate our approach using three popular LLMs with different architectures--Llama-2, MPT, and OpenLLaMA--across various benchmarks and tasks. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/FuseLLM}.

Conventional fusion techniques versus FuseLLM's knowledge transfer method for diverse architecture LLMs visualized with animal icons.

Overview

  • The paper introduces knowledge fusion in LLMs to merge the strengths of multiple pre-existing LLMs and create a superior model.

  • Knowledge fusion utilizes the generative distributions of source LLMs by aligning token probabilities to transfer knowledge to a target LLM.

  • The methodology enables knowledge transfer without the need for homogeneous architectures or running multiple models simultaneously.

  • Evaluations of the knowledge fusion technique show significant performance enhancements over individual models and ensemble methods in tasks such as reasoning and code generation.

  • The paper suggests that knowledge fusion can lead to more advanced, cost-effective, and environmentally sustainable AI language processing applications.

Introduction

In the landscape of NLP, the development of LLMs represents a significant stride forward in the ability of machines to process and understand human language. The training of such models, albeit yielding powerful computational tools, demands substantial resources. The paper under discussion introduces an alternative to building these complex models from the ground up. The authors present an innovative technique known as knowledge fusion, which essentially merges the expertise of various pre-existing LLMs to produce an advanced and capable successor without the traditionally associated costs and environmental impact.

Methodology

The novel knowledge fusion strategy transcends traditional methods that typically require homogeneous model architectures or maintain multiple models in parallel. Instead, it harnesses the predictive power embedded in the generative distributions of various source LLMs. By focusing on the probabilistic outcomes these models generate, the authors can transfer the unique knowledge and strengths of each contributing LLM to a singular target LLM via a process of lightweight continual training. The amalgamation occurs not by blending raw model parameters but by aligning the token probabilities associated with specific text inputs.

Evaluation

The authors put their method to the test using three distinct LLMs: Llama-2, MPT, and OpenLLaMA. Across multiple tasks and benchmarks related to reasoning, commonsense understanding, and code generation, knowledge fusion displays a marked improvement in performance over individual source models and a basic ensemble baseline. Importantly, the improvements are not just quantitative; the fused model exhibits gains in a broad array of capabilities, hinting at a qualitative enhancement of the model's knowledge base.

Implications and Conclusions

Concluding their findings, the researchers underline the potency and promise of knowledge fusion in LLMs, noting it as a fertile area for future advancement. Their work demonstrates that the fused model surpasses the capability of its individual parts, suggesting that the collective wisdom of distinct models, when harnessed appropriately, can lead to a computational sum greater than its parts. The researchers provided a foundation for potentially cost-saving, environmentally friendlier, and sophisticated advancements in AI language processing, opening a door to a range of applications that can benefit from more intelligent and capable LLMs.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube