MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning (2405.12130v1)
Abstract: Low-rank adaptation is a popular parameter-efficient fine-tuning method for LLMs. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. To achieve it, we introduce the corresponding non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix. Furthermore, these operators ensure that the weight can be merged back into LLMs, which makes our method can be deployed like LoRA. We perform a comprehensive evaluation of our method across five tasks: instruction tuning, mathematical reasoning, continual pretraining, memory and pretraining. Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
- Disc-finllm: A chinese financial large language model based on multiple experts fine-tuning. arXiv preprint arXiv:2310.15205.
- Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering. arXiv preprint arXiv:2210.03849.
- Adapting large language models via reading comprehension. arXiv preprint arXiv:2309.09530.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
- Evaluating language models for mathematics through interactions. arXiv preprint arXiv:2306.01694.
- Franck Dernoncourt and Ji Young Lee. 2017. Pubmed 200k rct: a dataset for sequential sentence classification in medical abstracts. arXiv preprint arXiv:1710.06071.
- Qlora: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36.
- Specializing smaller language models towards multi-step reasoning. In International Conference on Machine Learning, pages 10421–10430. PMLR.
- The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
- Medalpaca–an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247.
- LoRA+: Efficient Low Rank Adaptation of Large Models. 3.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300.
- Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874.
- Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790–2799. PMLR.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
- Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398.
- Camels in a changing climate: Enhancing lm adaptation with tulu 2. arXiv preprint arXiv:2311.10702.
- What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14):6421.
- Pubmedqa: A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577.
- Vera: Vector-based random matrix adaptation. arXiv preprint arXiv:2310.11454.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
- Stack more layers differently: High-rank training through low-rank updates. arXiv preprint arXiv:2307.05695.
- Chipnemo: Domain-adapted llms for chip design. arXiv preprint arXiv:2311.00176.
- Dora: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353.
- Www’18 open challenge: financial opinion mining and question answering. In Companion proceedings of the the web conference 2018, pages 1941–1942.
- Good debt or bad debt: Detecting semantic orientations in economic texts. Journal of the Association for Information Science and Technology, 65(4):782–796.
- Periodiclora: Breaking the low-rank bottleneck in lora optimization. arXiv preprint arXiv:2402.16141.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140):1–67.
- Tied-lora: Enhacing parameter efficiency of lora with weight tying. arXiv preprint arXiv:2311.09578.
- Domain adaption of named entity recognition to support credit risk assessment. In Proceedings of the Australasian Language Technology Association Workshop 2015, pages 84–90, Parramatta, Australia.
- Noam Shazeer. 2020. Glu variants improve transformer. arXiv preprint arXiv:2002.05202.
- Reslora: Identity residual mapping in low-rank adaption. arXiv preprint arXiv:2402.18039.
- Ankur Sinha and Tanmay Khandait. 2021. Impact of news on the commodity market: Dataset and results. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Volume 2, pages 589–601. Springer.
- Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
- How far can camels go? exploring the state of instruction tuning on open resources. Advances in Neural Information Processing Systems, 36.
- Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564.
- Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284.
- Biao Zhang and Rico Sennrich. 2019. Root mean square layer normalization. Advances in Neural Information Processing Systems, 32.
- Lima: Less is more for alignment. Advances in Neural Information Processing Systems, 36.
- Asymmetry in low-rank adapters of foundation models. arXiv preprint arXiv:2402.16842.
- Ting Jiang (28 papers)
- Shaohan Huang (79 papers)
- Shengyue Luo (2 papers)
- Zihan Zhang (121 papers)
- Haizhen Huang (18 papers)
- Furu Wei (291 papers)
- Weiwei Deng (29 papers)
- Feng Sun (34 papers)
- Qi Zhang (785 papers)
- Deqing Wang (36 papers)
- Fuzhen Zhuang (97 papers)