Emergent Mind

MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models

(2405.13053)
Published May 19, 2024 in cs.CL and cs.AI

Abstract

The \textit{pretrain+fine-tune} paradigm is foundational in deploying LLMs across a diverse range of downstream applications. Among these, Low-Rank Adaptation (LoRA) stands out for its parameter-efficient fine-tuning (PEFT), producing numerous off-the-shelf task-specific LoRA adapters. However, this approach requires explicit task intention selection, posing challenges for automatic task sensing and switching during inference with multiple existing LoRA adapters embedded in a single LLM. In this work, we introduce \textbf{\method} (\textbf{M}ultiple-\textbf{T}asks embedded \textbf{LoRA}), a scalable multi-knowledge LoRA fusion framework designed for LLMs. \method\ integrates various LoRA adapters in a Mixture-of-Experts (MoE) style into the base LLM, enabling the model to automatically select the most pertinent adapter based on the task input. This advancement significantly enhances the LLM's capability to handle composite tasks that require different adapters to solve various components of the problem. Our evaluations, featuring the LlaMA2-13B and LlaMA3-8B base models equipped with off-the-shelf 28 LoRA adapters through \method, demonstrate equivalent performance with the individual adapters. Furthermore, both base models equipped with \method\ achieve superior performance in sequentially solving composite tasks with ten problems in only a single inference process, highlighting the ability of timely intention switching in \method\ embedded LLMs.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.