ResLoRA: Identity Residual Mapping in Low-Rank Adaption (2402.18039v1)
Abstract: As one of the most popular parameter-efficient fine-tuning (PEFT) methods, low-rank adaptation (LoRA) is commonly applied to fine-tune LLMs. However, updating the weights of LoRA blocks effectively and expeditiously is challenging due to the long calculation path in the original model. To address this, we propose ResLoRA, an improved framework of LoRA. By adding residual paths during training and using merging approaches to eliminate these extra paths during inference, our method can achieve better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA. The experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of our method. To the best of our knowledge, ResLoRA is the first work that combines the residual path with LoRA. The code of our method is available at https://github.com/microsoft/LMOps/tree/main/reslora .
- MathQA: Towards interpretable math word problem solving with operation-based formalisms. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2357–2367, Minneapolis, Minnesota. Association for Computational Linguistics.
- Anonymous. 2024. MoLE: Mixture of loRA experts. In The Twelfth International Conference on Learning Representations.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13733–13742.
- William Ford. 2014. Numerical linear algebra with applications: Using MATLAB. Academic Press.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
- Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pages 630–645. Springer.
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
- Lorahub: Efficient cross-task generalization via dynamic loRA composition. In R0-FoMo:Robustness of Few-shot and Zero-shot Learning in Large Foundation Models.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708.
- Fedpara: Low-rank hadamard product for communication-efficient federated learning.
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
- Stack more layers differently: High-rank training through low-rank updates. arXiv preprint arXiv:2307.05695.
- Program induction by rationale generation: Learning to solve and explain algebraic word problems. ACL.
- Gpt understands, too. AI Open.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
- A comprehensive overview of large language models. arXiv preprint arXiv:2307.06435.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
- Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2080–2094, Online. Association for Computational Linguistics.
- Justin N. M. Pinkney. 2022. Pokemon blip captions. https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions/.
- Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–16. IEEE.
- Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3505–3506.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer.
- Analysing mathematical reasoning abilities of neural models. arXiv preprint arXiv:1904.01557.
- Highway networks. arXiv preprint arXiv:1505.00387.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Dylora: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. ArXiv, abs/2210.07558.
- Attention is all you need. Advances in neural information processing systems, 30.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
- AdaMix: Mixture-of-adaptations for parameter-efficient model tuning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5744–5760, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
- Fine-tuned LLMs know more, hallucinate less with few-shot sequence-to-sequence semantic parsing over Wikidata. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5778–5791, Singapore. Association for Computational Linguistics.
- Navigating text-to-image customization: From lycoris fine-tuning to model evaluation. arXiv preprint arXiv:2309.14859.
- Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284.
- Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
- Yuchen Zeng and Kangwook Lee. 2023. The expressive power of low-rank adaptation. In OPT 2023: Optimization for Machine Learning.
- Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512.
- Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199.
- Shuhua Shi (2 papers)
- Shaohan Huang (79 papers)
- Minghui Song (18 papers)
- Zhoujun Li (122 papers)
- Zihan Zhang (121 papers)
- Haizhen Huang (18 papers)
- Furu Wei (291 papers)
- Weiwei Deng (29 papers)
- Feng Sun (34 papers)
- Qi Zhang (785 papers)