Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning (2402.10462v1)

Published 16 Feb 2024 in cs.LG and cs.CL

Abstract: Finetuning LLMs requires huge GPU memory, restricting the choice to acquire Larger models. While the quantized version of the Low-Rank Adaptation technique, named QLoRA, significantly alleviates this issue, finding the efficient LoRA rank is still challenging. Moreover, QLoRA is trained on a pre-defined rank and, therefore, cannot be reconfigured for its lower ranks without requiring further fine-tuning steps. This paper proposes QDyLoRA -Quantized Dynamic Low-Rank Adaptation-, as an efficient quantization approach for dynamic low-rank adaptation. Motivated by Dynamic LoRA, QDyLoRA is able to efficiently finetune LLMs on a set of pre-defined LoRA ranks. QDyLoRA enables fine-tuning Falcon-40b for ranks 1 to 64 on a single 32 GB V100-GPU through one round of fine-tuning. Experimental results show that QDyLoRA is competitive to QLoRA and outperforms when employing its optimal rank.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255.
  2. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  3. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  4. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  5. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235.
  6. Krona: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650.
  7. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366.
  8. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
  9. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  10. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  11. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551.
  12. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327.
  13. Alphatuning: Quantization-aware parameter-efficient adaptation of large-scale pre-trained language models. arXiv preprint arXiv:2210.03858.
  14. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965.
  15. Webglm: Towards an efficient web-enhanced question answering system with human preferences. arXiv preprint arXiv:2306.07906.
  16. Unipelt: A unified framework for parameter-efficient language model tuning. arXiv preprint arXiv:2110.07577.
  17. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Advances in Neural Information Processing Systems, 35:12991–13005.
  18. Stanford alpaca: An instruction-following llama model.
  19. Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. arXiv preprint arXiv:2210.07558.
  20. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets