LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild (2402.09997v1)

Published 15 Feb 2024 in cs.AI, cs.CL, and cs.LG

Abstract: Low-Rank Adaptation (LoRA) provides an effective yet efficient solution for fine-tuning LLMs (LLM). The modular and plug-and-play nature of LoRA enables the integration of diverse domain-specific LoRAs to enhance the capabilities of LLMs. Previous research on exploiting multiple LoRAs either focuses on specific isolated downstream tasks or fixes the selection of LoRAs during training. However, in real-world scenarios, LLMs receive diverse prompts covering different tasks, and the pool of candidate LoRAs is often dynamically updated. To bridge this gap, we propose LoraRetriever, a retrieve-then-compose framework that adaptively retrieves and composes multiple LoRAs according to the input prompts. LoraRetriever contains three main components: firstly, identifying and retrieving LoRAs relevant to the given input; secondly, formulating strategies for effectively integrating the retrieved LoRAs; and thirdly, developing efficient batch inference to accommodate heterogeneous requests. Experimental results indicate that LoraRetriever consistently outperforms the baselines, highlighting its practical effectiveness and versatility.

Citations (16)

View on Semantic Scholar

Summary

The paper introduces LoraRetriever, which dynamically selects and composes LoRA modules to enhance large language model fine-tuning for mixed real-world tasks.
It employs a retrieve-then-compose method that identifies relevant modules and devises integration strategies to improve adaptability and performance.
Experimental results show that LoraRetriever consistently outperforms baseline models in efficiently handling heterogeneous batch inference.

LoraRetriever, as proposed in the paper "LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild," aims to enhance the adaptiveness and efficiency of Low-Rank Adaptation (LoRA) for fine-tuning LLMs in diverse real-world scenarios (2402.09997). LoRA enables modular adaptations to LLMs by incorporating domain-specific submodules. However, the current utilization of multiple LoRA modules often focuses on isolated tasks or static compositions, which limits their adaptability to the dynamic nature of real-world tasks and prompts.

The LoraRetriever framework addresses this limitation by employing a retrieve-then-compose approach that dynamically selects and integrates LoRA modules based on the input prompts. This process consists of three key stages:

Identifying and Retrieving Relevant LoRA Modules: The system first determines which LoRA modules are most pertinent to the given input.
Formulating Integration Strategies: It then devises strategies to effectively combine the retrieved LoRA modules to enhance the LLM's performance on the specific input.
Developing Efficient Batch Inference: Finally, it accommodates heterogeneous requests through efficient batch processing.

Experimental results indicate that LoraRetriever consistently outperforms baseline models, demonstrating its practical effectiveness and versatility in managing mixed tasks in dynamic environments.

In relation to LoraRetriever, another system worth noting is LoraHub, which also focuses on the composition of LoRA modules for cross-task generalization. LoraHub allows fluid combination of LoRA modules trained on various tasks to achieve improved performance on unseen tasks without requiring additional parameters or gradients (Huang et al., 2023). This system highlights the potential for creating a shared ecosystem of LoRA modules that can be applied to novel tasks, facilitating broader adaptability and user collaboration.

Furthermore, frameworks like DoRA, which decompose the weight updates during fine-tuning into magnitude and direction, aim to bridge the accuracy gap between full fine-tuning and LoRA-based methods by enhancing the learning capacity and stability of LoRA adaptations (Liu et al., 14 Feb 2024).

Overall, LoraRetriever and related methodologies like LoraHub and DoRA represent significant advancements in making LLMs more adaptable and efficient for a wide range of dynamically changing tasks and prompts.