Emergent Mind

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

(2405.17604)
Published May 27, 2024 in cs.LG , cs.AI , and cs.CL

Abstract

The recent trend in scaling language models has led to a growing demand for parameter-efficient tuning (PEFT) methods such as LoRA (Low-Rank Adaptation). LoRA consistently matches or surpasses the full fine-tuning baseline with fewer parameters. However, handling numerous task-specific or user-specific LoRA modules on top of a base model still presents significant storage challenges. To address this, we introduce LoRA-XS (Low-Rank Adaptation with eXtremely Small number of parameters), a novel approach leveraging Singular Value Decomposition (SVD) for parameter-efficient fine-tuning. LoRA-XS introduces a small r x r weight matrix between frozen LoRA matrices, which are constructed by SVD of the original weight matrix. Training only r x r weight matrices ensures independence from model dimensions, enabling more parameter-efficient fine-tuning, especially for larger models. LoRA-XS achieves a remarkable reduction of trainable parameters by over 100x in 7B models compared to LoRA. Our benchmarking across various scales, including GLUE, GSM8k, and MATH benchmarks, shows that our approach outperforms LoRA and recent state-of-the-art approaches like VeRA in terms of parameter efficiency while maintaining competitive performance.

LoRA-XS employs a small trainable matrix R between frozen low-rank matrices derived from SVD.

Overview

  • The LoRA-XS method proposes an efficient approach to fine-tune LLMs by dramatically reducing the number of trainable parameters while maintaining high performance.

  • LoRA-XS uses Singular Value Decomposition (SVD) to initialize and construct a minimal trainable weight matrix, significantly improving storage and computational efficiency.

  • Empirical results from benchmarks like GLUE, Mistral-7B, and Gemma-7B illustrate LoRA-XS's competitive performance compared to traditional methods, with substantial reductions in trainable parameters.

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

The paper "LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters" addresses a critical concern in the field of NLP and, specifically, in the adaptation of LLMs. The proposed method, LoRA-XS, enhances parameter-efficient tuning approaches by significantly reducing the number of trainable parameters while maintaining competitive performance across various NLP benchmarks.

In recent years, the advent and scaling of LLMs have driven remarkable advancements in NLP. However, the sheer size of these models introduces extensive challenges, particularly when it comes to fine-tuning them for specific downstream tasks. Traditional fine-tuning methods necessitate updating a vast number of parameters, leading to substantial computational and storage demands. Parameter-efficient fine-tuning (PEFT) methods such as Low-Rank Adaptation (LoRA) emerged as viable solutions, significantly reducing the number of trainable parameters. Despite their success, even methods like LoRA face storage challenges, especially when handling large-scale personalized or task-specific models.

Methodology

LoRA-XS leverages Singular Value Decomposition (SVD) to address these limitations. The core idea involves constructing a small $r \times r$ trainable weight matrix placed between frozen LoRA matrices, derived from the original pretrained weight matrix. This setup ensures that the number of trainable parameters is decoupled from the model dimensions, thus providing a substantial reduction in trainable parameters, by over 100x in specific cases. The LoRA-XS method comprises the following key steps:

  1. SVD-Based Initialization: The pretrained weight matrix $W$ undergoes truncated SVD, decomposing it into matrices $U$, $Σ$, and $V$.
  2. Frozen Adaptation Matrices: The decomposition results in low-rank adaptation matrices $A$ and $B$, which are fixed during training.
  3. Introduction of Trainable Matrix: A small $r \times r$ trainable matrix $R$ is placed between $A$ and $B$, acting as the only trainable component, thereby significantly reducing the parameter count.

This novel approach not only improves parameter efficiency but also enhances flexibility, as the number of trainable parameters can be precisely controlled based on the downstream task requirements.

Experiments and Results

GLUE Benchmark

The paper evaluates LoRA-XS on the GLUE benchmark using the RoBERTa-large model and compares it against full fine-tuning (FT), LoRA, and VeRA. The results reveal that LoRA-XS with ranks ranging from 4 to 25 outperforms both LoRA and VeRA in parameter efficiency while maintaining high performance. For instance, LoRA-XS with a rank of 16 achieved superior accuracy over VeRA and LoRA, with a 2.5x and 30x reduction in trainable parameters, respectively.

Instruction Tuning

The method's efficacy in instruction tuning is also validated on the Mistral-7B and Gemma-7B models, focusing on the MetaMathQA dataset and evaluating on the GSM8K and MATH benchmarks. LoRA-XS demonstrated competitive performance to both full fine-tuning and LoRA, achieving this with a drastic reduction in trainable parameters. For example, LoRA-XS with only 0.92M parameters performed comparably to LoRA with 168M parameters on both benchmarks.

Ablation Study

An important aspect of the paper is the ablation study comparing SVD-based initialization versus random initialization. The results underscore the benefits of aligning adaptation matrices with the top principal components of the pretrained weights. The SVD-based initialization not only led to better final performance but also accelerated convergence, particularly with smaller ranks.

Implications and Future Directions

LoRA-XS's implications are substantial for both theoretical and practical aspects of AI and NLP. The method addresses the scalability issue of PEFT methods by making trainable parameters independent of model dimensions, particularly beneficial for large-scale models. From a practical standpoint, this reduces the computational and storage overhead, paving the way for more efficient deployments of LLMs in real-world applications.

The theoretical implication lies in the innovative use of SVD for initializing adaptation matrices, suggesting a shift towards more informed initialization strategies in deep learning. Future research can explore extending LoRA-XS to other architectures and tasks, including reinforcement learning and multimodal models. Also, further studies could investigate the joint application of LoRA-XS with other memory-saving techniques like model quantization.

Conclusion

In conclusion, LoRA-XS presents a significant advancement in parameter-efficient fine-tuning. By leveraging SVD and introducing a minimal number of trainable parameters, it addresses the critical scalability and storage challenges faced by LLMs. The empirical results across various benchmarks and models highlight its efficiency and potential, marking a meaningful contribution to the field of NLP and model optimization.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.