Emergent Mind

DoRA: Weight-Decomposed Low-Rank Adaptation

(2402.09353)
Published Feb 14, 2024 in cs.CL and cs.CV

Abstract

Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed LowRank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding. Code available at https://github.com/NVlabs/DoRA.

LLaMA-7B's improvement with LoRA, DoRA, VeRA, DoRAs on MT-Bench, varies by Alpaca training samples.

Overview

  • DoRA, a novel Weight-Decomposed Low-Rank Adaptation method, refines Parameter-Efficient Fine-Tuning (PEFT) by decomposing model weights into magnitude and direction, improving learning efficiency.

  • Comparative analysis of DoRA and LoRA demonstrates DoRA's superior performance in learning behavior and task outcomes across various benchmarks without extra inference overhead.

  • DoRA significantly enhances PEFT methods, bridging the performance gap between LoRA-based approaches and full fine-tuning in large pre-trained models.

  • The research underscores the potential of DoRA for a wider application in optimizing fine-tuning processes, suggesting future exploration across other model architectures and tasks.

Enhancing Parameter-Efficient Fine-Tuning in LLMs with DoRA

Introduction to Parameter-Efficient Fine-Tuning (PEFT)

Advances in large pre-trained models have revolutionized the field of NLP and multi-modal tasks. However, fine-tuning these models for specific downstream tasks often requires significant computational resources. Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, have emerged as alternatives to alleviate the computational burden. LoRA, in particular, has been recognized for not introducing extra inference overhead. Nonetheless, a performance gap remains between LoRA-based approaches and full fine-tuning (FT), prompting further investigation into bridging this divide.

Unveiling the DoRA Framework

Weight-Decomposed Low-Rank Adaptation (DoRA) is introduced as a novel approach that decomposes pre-trained weights into magnitude and direction components for fine-tuning. By focusing LoRA on directional updates, DoRA seeks to refine the learning capabilities associated with PEFT methods. The research compares the learning behaviors and theoretical implications underlying DoRA and LoRA, employing comprehensive experiments across various tasks and backbones, including LLaMA, LLaVA, and VL-BART.

Comparative Analysis and Findings

DoRA presents a more nuanced way of updating model weights compared to LoRA, showing improvements in learning capacity and performance across a range of benchmarks. The paper provides an insightful decomposition analysis, revealing distinct patterns between FT and LoRA, where DoRA emerges with a learning behavior closely mimicking that of FT. This revelation is supported by numerical results in tasks such as commonsense reasoning and visual instruction tuning, where DoRA consistently outperforms LoRA without imposing additional inference overhead.

Implications and Future Directions

The introduction of DoRA not only advances the understanding of PEFT methods but also opens new avenues for optimizing fine-tuning processes in large models. The research demonstrates DoRA's capacity to adapt efficiently to a range of different tasks, suggesting its potential applicability beyond the explored domains. As DoRA addresses both practical and theoretical challenges in PEFT, future work could extend its application to other model architectures and fine-tuning scenarios, further consolidating the efficiency and versatility of PEFT methods in adapting pre-trained models to specific tasks.

Conclusion

Exploring the intricacies of parameter-efficient fine-tuning, this work introduces DoRA as a significant enhancement over existing methodologies like LoRA. DoRA not only narrows the accuracy gap with full fine-tuning but also maintains the efficiency hallmark of PEFT methods. These contributions underline the importance of understanding and optimizing the various components of model weights in fine-tuning, providing a foundation for more effective and efficient utilization of large pre-trained models across diverse NLP and multi-modal applications.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube
Reddit
DoRA: Weight-Decomposed Low-Rank Adaptation (56 points, 5 comments) in /r/LocalLLaMA