Emergent Mind

A Survey on LoRA of Large Language Models

(2407.11046)
Published Jul 8, 2024 in cs.LG , cs.AI , and cs.CL

Abstract

Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is necessary to conduct a comprehensive overview of the current progress on LoRA. This survey categorizes and reviews the progress from the perspectives of (1) downstream adaptation improving variants that improve LoRA's performance on downstream tasks; (2) cross-task generalization methods that mix multiple LoRA plugins to achieve cross-task generalization; (3) efficiency-improving methods that boost the computation-efficiency of LoRA; (4) data privacy-preserving methods that use LoRA in federated learning; (5) application. Besides, this survey also discusses the future directions in this field.

LoRA method applied to federated learning.

Overview

  • The paper surveys Low-Rank Adaptation (LoRA) in LLMs, highlighting it as a parameter-efficient technique for fine-tuning without sacrificing performance.

  • It categorizes existing LoRA research into five areas: improving downstream adaptation, cross-task generalization, efficiency improvements, federated learning applications, and diverse practical applications.

  • The survey concludes by emphasizing LoRA's future potential, identifying areas such as Generative-as-a-Service (GaaS) and LLM-based autonomous agents as promising directions for further research.

A Survey on LoRA of LLMs

The paper "A Survey on LoRA of LLMs" provides a comprehensive and methodical exploration of Low-Rank Adaptation (LoRA) in the context of LLMs. Authored by Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, and Yunjun Gao from Zhejiang University, this survey addresses the fundamental aspects and recent advancements of LoRA, a prominent parameter-efficient fine-tuning technique that updates dense neural network layers with low-rank matrices.

Introduction and Background

LLMs, having grown exponentially in parameter scales, have demonstrated emergent abilities and improved generalization. However, to extend their versatility to specific downstream tasks, fine-tuning remains indispensable. The traditional approach of full fine-tuning, although effective, is computationally exorbitant. LoRA stands out as an effective alternative that enables parameter-efficient fine-tuning by updating dense layers through low-rank matrices. This efficiency is achieved without compromising the model's performance and while keeping computational requirements significantly low.

Taxonomy of LoRA Research

The survey categorizes the existing research on LoRA into five primary areas:

  1. Improving Downstream Adaptation:

    • Techniques aimed at improving LoRA's performance on specific downstream tasks by breaking the low-rank bottleneck, employing dynamic rank allocation, optimizing the learning procedure, and combining with other learning paradigms.
  2. Cross-task Generalization:

    • Methods that mix multiple LoRA plugins to enhance generalization across different tasks, involving manual designs, learned weights, and expert-based approaches.
  3. Efficiency Improvements:

    • Innovations that boost the computational efficiency of LoRA, including parameter reduction through freezing and pruning, parameter quantization, and parallel computation frameworks.
  4. Federated Learning Applications:

    • Utilizations of LoRA in federated learning contexts to handle data, device, and model heterogeneity and to preserve privacy.
  5. Applications:

    • Demonstrations of LoRA's application across a spectrum of tasks including traditional NLP, code, model alignment, vertical domain-specific tasks, and vision and multimodal tasks.

Downstream Adaptation Improving

The survey explores various strategies to enhance LoRA's effectiveness in downstream adaptation. Methods include stacking LoRAs along fine-tuning iterations, treating updates as gradient compressors to avoid low-rank limitations, and co-updating LLMs with LoRA. Furthermore, dynamic rank allocation methods optimize LoRA's performance by adjusting the ranks during fine-tuning based on the importance of layers or components.

Cross-Task Generalization

LoRA mixtures are crucial for achieving cross-task generalization. The paper discusses approaches involving manually designed weights, weights learned through various optimization schemes, and mixtures of LoRA experts. These techniques enhance the model's capacity to generalize across multiple tasks and improve the adaptability and flexibility of LLMs.

Efficiency Improvements

Efficiency is a cornerstone in LoRA's applicability. The survey explore methods to reduce parameter footprint through freezing, pruning, and sharing. Additionally, quantization methods, both post-training and during training, are explored for their potential to minimize computational costs. The incorporation of parallel LoRA computing frameworks facilitates efficient fine-tuning and inference processes, enabling scalable LoRA-based applications.

LoRA in Federated Learning

Federated learning scenarios benefit substantially from LoRA's parameter efficiency and pluggability. The survey highlights techniques to manage data, device, and model heterogeneity, ensuring robust model performance across various clients. Additionally, methods to safeguard parameter privacy in federated setups are discussed, leveraging LoRA's modular updates to enhance security and efficiency.

Applications

LoRA's versatility is showcased through its applications in diverse fields. In traditional NLP tasks, code-related tasks, model alignment, and vertical domains like medicine and finance, LoRA fine-tuning significantly reduces computational costs. In the vision domain, LoRA enhances image generation and segmentation tasks. Furthermore, LoRA's adaptability extends to multimodal tasks, facilitating efficient alignment and processing of various data modalities.

Conclusion and Future Directions

The survey succinctly concludes that LoRA's parameter efficiency, pluggability, and compatibility position it as an indispensable tool for LLM applications. The paper identifies potential future directions where LoRA could have profound impacts, including Generative-as-a-Service (GaaS), continued pre-training, and in LLM-based autonomous agents. These future possibilities underscore the ongoing relevance and potential of LoRA in advancing the capabilities of LLMs.

In summary, this paper provides an insightful and thorough review of LoRA's development, addressing practical and theoretical innovations, and outlines future research trajectories, making it a valuable reference for researchers and practitioners working with LLMs.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.