A Survey on LoRA of Large Language Models (2407.11046v4)

Published 8 Jul 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is necessary to conduct a comprehensive overview of the current progress on LoRA. This survey categorizes and reviews the progress from the perspectives of (1) downstream adaptation improving variants that improve LoRA's performance on downstream tasks; (2) cross-task generalization methods that mix multiple LoRA plugins to achieve cross-task generalization; (3) efficiency-improving methods that boost the computation-efficiency of LoRA; (4) data privacy-preserving methods that use LoRA in federated learning; (5) application. Besides, this survey also discusses the future directions in this field. At last, we provide a Github page~\footnote{\href{https://github.com/ZJU-LLMs/Awesome-LoRAs.git}{https://github.com/ZJU-LLMs/Awesome-LoRAs.git}} for readers to check the updates and initiate discussions on this survey paper.

Citations (6)

View on Semantic Scholar

Summary

The paper introduces Low-Rank Adaptation (LoRA) as a parameter-efficient method that reduces fine-tuning costs while maintaining model performance.
It categorizes LoRA research into areas like downstream adaptation, cross-task generalization, efficiency improvements, and federated learning applications.
The survey highlights innovative strategies such as dynamic rank allocation and gradient compression, and outlines promising future research directions.

A Survey on LoRA of LLMs

The paper "A Survey on LoRA of LLMs" provides a comprehensive and methodical exploration of Low-Rank Adaptation (LoRA) in the context of LLMs. Authored by Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, and Yunjun Gao from Zhejiang University, this survey addresses the fundamental aspects and recent advancements of LoRA, a prominent parameter-efficient fine-tuning technique that updates dense neural network layers with low-rank matrices.

Introduction and Background

LLMs, having grown exponentially in parameter scales, have demonstrated emergent abilities and improved generalization. However, to extend their versatility to specific downstream tasks, fine-tuning remains indispensable. The traditional approach of full fine-tuning, although effective, is computationally exorbitant. LoRA stands out as an effective alternative that enables parameter-efficient fine-tuning by updating dense layers through low-rank matrices. This efficiency is achieved without compromising the model's performance and while keeping computational requirements significantly low.

Taxonomy of LoRA Research

The survey categorizes the existing research on LoRA into five primary areas:

Improving Downstream Adaptation:
- Techniques aimed at improving LoRA's performance on specific downstream tasks by breaking the low-rank bottleneck, employing dynamic rank allocation, optimizing the learning procedure, and combining with other learning paradigms.
Cross-task Generalization:
- Methods that mix multiple LoRA plugins to enhance generalization across different tasks, involving manual designs, learned weights, and expert-based approaches.
Efficiency Improvements:
- Innovations that boost the computational efficiency of LoRA, including parameter reduction through freezing and pruning, parameter quantization, and parallel computation frameworks.
Federated Learning Applications:
- Utilizations of LoRA in federated learning contexts to handle data, device, and model heterogeneity and to preserve privacy.
Applications:
- Demonstrations of LoRA's application across a spectrum of tasks including traditional NLP, code, model alignment, vertical domain-specific tasks, and vision and multimodal tasks.

Downstream Adaptation Improving

The survey explores various strategies to enhance LoRA's effectiveness in downstream adaptation. Methods include stacking LoRAs along fine-tuning iterations, treating updates as gradient compressors to avoid low-rank limitations, and co-updating LLMs with LoRA. Furthermore, dynamic rank allocation methods optimize LoRA's performance by adjusting the ranks during fine-tuning based on the importance of layers or components.

Cross-Task Generalization

LoRA mixtures are crucial for achieving cross-task generalization. The paper discusses approaches involving manually designed weights, weights learned through various optimization schemes, and mixtures of LoRA experts. These techniques enhance the model's capacity to generalize across multiple tasks and improve the adaptability and flexibility of LLMs.

Efficiency Improvements

Efficiency is a cornerstone in LoRA's applicability. The survey explores methods to reduce parameter footprint through freezing, pruning, and sharing. Additionally, quantization methods, both post-training and during training, are explored for their potential to minimize computational costs. The incorporation of parallel LoRA computing frameworks facilitates efficient fine-tuning and inference processes, enabling scalable LoRA-based applications.

LoRA in Federated Learning

Federated learning scenarios benefit substantially from LoRA's parameter efficiency and pluggability. The survey highlights techniques to manage data, device, and model heterogeneity, ensuring robust model performance across various clients. Additionally, methods to safeguard parameter privacy in federated setups are discussed, leveraging LoRA's modular updates to enhance security and efficiency.

Applications

LoRA's versatility is showcased through its applications in diverse fields. In traditional NLP tasks, code-related tasks, model alignment, and vertical domains like medicine and finance, LoRA fine-tuning significantly reduces computational costs. In the vision domain, LoRA enhances image generation and segmentation tasks. Furthermore, LoRA's adaptability extends to multimodal tasks, facilitating efficient alignment and processing of various data modalities.

Conclusion and Future Directions

The survey succinctly concludes that LoRA's parameter efficiency, pluggability, and compatibility position it as an indispensable tool for LLM applications. The paper identifies potential future directions where LoRA could have profound impacts, including Generative-as-a-Service (GaaS), continued pre-training, and in LLM-based autonomous agents. These future possibilities underscore the ongoing relevance and potential of LoRA in advancing the capabilities of LLMs.

In summary, this paper provides an insightful and thorough review of LoRA's development, addressing practical and theoretical innovations, and outlines future research trajectories, making it a valuable reference for researchers and practitioners working with LLMs.

PDF Markdown

Related Papers

Tweets

https://twitter.com/fly51fly/status/1813691245279248549