Emergent Mind

Abstract

While LLMs are empowered with broad knowledge, their task-specific performance is often suboptimal. It necessitates fine-tuning LLMs with task-specific data, but such data may be inaccessible due to privacy concerns. In this paper, we propose a novel approach to enhance LLMs with smaller language models (SLMs) that are trained on clients using their private task-specific data. To enable mutual enhancement between LLMs and SLMs, we propose CrossLM, where the SLMs promote the LLM to generate task-specific high-quality data, and both the LLM and SLMs are enhanced with the generated data. We evaluate CrossLM using publicly accessible language models across a range of benchmark tasks. The results demonstrate that CrossLM significantly enhances the task-specific performance of SLMs on clients and the LLM on the cloud server simultaneously while preserving the LLM's generalization capability.

Overview

  • The CrossLM framework improves both Large and Small Language Models by enabling knowledge transfer without direct data sharing, suitable for privacy-sensitive environments.

  • CrossLM facilitates federated learning for language models while avoiding extensive resource requirements and preserving data privacy.

  • The framework conducts collaborative training between a central LLM and client-specific SLMs using synthetic datasets, enhancing task-specific performance without needing real client data.

  • Empirical tests show significant performance gains in SLMs and preserved generalization abilities in LLMs after CrossLM training, with minimal performance loss on unrelated tasks.

Overview of CrossLM Framework

The CrossLM framework presents an innovative approach that facilitates the enhancement of both LLMs and Small Language Models (SLMs) through a strategy that bypasses the need for direct data sharing. This is particularly important in scenarios where privacy concerns and data governance regulations restrict the usage of domain-specific data for model training.

Addressing Privacy and Resource Constraints

The novel aspect of CrossLM is rooted in its ability to expand the utility of federated learning for LLMs without imposing the concomitant resource burdens that typically accompany such models. Previous methods have either relied on updating a subset of LLM parameters or splitting the model training across the client and server—which pose their own challenges including significant resource demands and potential privacy issues.

CrossLM's Collaborative Training

CrossLM distinguishes itself by implementing a client-server collaborative training framework where clients' SLMs are tailored to the respective resource capabilities and privacy needs of each client. CrossLM's technical crux is its data-free knowledge transfer, catalyzed by the generative strengths of an LLM to synthesize task-specific datasets. These datasets are then subject to a feedback loop with the SLMs to refine the quality of the LLM's output. This mutualistic relationship fosters an environment where both the LLM and SLMs guide each other toward improved task-specific performance.

Experimental Validation

Empirical evaluations showcase CrossLM's proficiency in enhancing task-specific performances of SLMs by an average of 5.8% to 7.8%, a considerable margin over standalone methods. Furthermore, when compared to Data-free Knowledge Distillation (KD), CrossLM still achieves an additional accuracy improvement of 2% to 2.7%. The LLM's prowess in both natural language understanding (NLU) and generation (NLG) is significantly bolstered post-CrossLM training, as evidenced by accuracy boosts of 18.3% for GPT2-Large and 13.6% for Llama-7B.

Preserving Generalization Capabilities

A critical aspect of CrossLM is the retention of the LLM's generalization capabilities after task-specific enhancement. The empirical findings suggest only marginal performance regressions on unrelated benchmark tasks, signaling that the LLM's broad applicability remains intact.

Concluding Thoughts

CrossLM emerges as an elegant solution that strikes a balance between enhancing task-specific performance and preserving generalization without compromising client data privacy. Its approach not only addresses resource limitations but also adapts to heterogeneous model structures, offering a versatile tool in the practitioner's kit for federated language model training. The framework's synchronous and one-shot learning characteristics add to its practical appeal, marking a step forward in the evolution of collaborative AI model training while safeguarding data privacy.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.