Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild (2402.09997v1)

Published 15 Feb 2024 in cs.AI, cs.CL, and cs.LG

Abstract: Low-Rank Adaptation (LoRA) provides an effective yet efficient solution for fine-tuning LLMs (LLM). The modular and plug-and-play nature of LoRA enables the integration of diverse domain-specific LoRAs to enhance the capabilities of LLMs. Previous research on exploiting multiple LoRAs either focuses on specific isolated downstream tasks or fixes the selection of LoRAs during training. However, in real-world scenarios, LLMs receive diverse prompts covering different tasks, and the pool of candidate LoRAs is often dynamically updated. To bridge this gap, we propose LoraRetriever, a retrieve-then-compose framework that adaptively retrieves and composes multiple LoRAs according to the input prompts. LoraRetriever contains three main components: firstly, identifying and retrieving LoRAs relevant to the given input; secondly, formulating strategies for effectively integrating the retrieved LoRAs; and thirdly, developing efficient batch inference to accommodate heterogeneous requests. Experimental results indicate that LoraRetriever consistently outperforms the baselines, highlighting its practical effectiveness and versatility.

Citations (16)

Summary

  • The paper introduces LoraRetriever, which dynamically selects and composes LoRA modules to enhance large language model fine-tuning for mixed real-world tasks.
  • It employs a retrieve-then-compose method that identifies relevant modules and devises integration strategies to improve adaptability and performance.
  • Experimental results show that LoraRetriever consistently outperforms baseline models in efficiently handling heterogeneous batch inference.

LoraRetriever, as proposed in the paper "LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild," aims to enhance the adaptiveness and efficiency of Low-Rank Adaptation (LoRA) for fine-tuning LLMs in diverse real-world scenarios (LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild, 15 Feb 2024). LoRA enables modular adaptations to LLMs by incorporating domain-specific submodules. However, the current utilization of multiple LoRA modules often focuses on isolated tasks or static compositions, which limits their adaptability to the dynamic nature of real-world tasks and prompts.

The LoraRetriever framework addresses this limitation by employing a retrieve-then-compose approach that dynamically selects and integrates LoRA modules based on the input prompts. This process consists of three key stages:

  1. Identifying and Retrieving Relevant LoRA Modules: The system first determines which LoRA modules are most pertinent to the given input.
  2. Formulating Integration Strategies: It then devises strategies to effectively combine the retrieved LoRA modules to enhance the LLM's performance on the specific input.
  3. Developing Efficient Batch Inference: Finally, it accommodates heterogeneous requests through efficient batch processing.

Experimental results indicate that LoraRetriever consistently outperforms baseline models, demonstrating its practical effectiveness and versatility in managing mixed tasks in dynamic environments.

In relation to LoraRetriever, another system worth noting is LoraHub, which also focuses on the composition of LoRA modules for cross-task generalization. LoraHub allows fluid combination of LoRA modules trained on various tasks to achieve improved performance on unseen tasks without requiring additional parameters or gradients (LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition, 2023). This system highlights the potential for creating a shared ecosystem of LoRA modules that can be applied to novel tasks, facilitating broader adaptability and user collaboration.

Furthermore, frameworks like DoRA, which decompose the weight updates during fine-tuning into magnitude and direction, aim to bridge the accuracy gap between full fine-tuning and LoRA-based methods by enhancing the learning capacity and stability of LoRA adaptations (DoRA: Weight-Decomposed Low-Rank Adaptation, 14 Feb 2024).

Overall, LoraRetriever and related methodologies like LoraHub and DoRA represent significant advancements in making LLMs more adaptable and efficient for a wide range of dynamically changing tasks and prompts.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets