Emergent Mind

Abstract

The success of LLMs, like GPT-4 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by finetuning open-access LLMs with task-specific data (e.g., ChatDoctor) or instruction data (e.g., Alpaca). Among the various fine-tuning methods, adapter-based parameter-efficient fine-tuning (PEFT) is undoubtedly one of the most attractive topics, as it only requires fine-tuning a few external parameters instead of the entire LLMs while achieving comparable or even better performance. To enable further research on PEFT methods of LLMs, this paper presents LLM-Adapters, an easy-to-use framework that integrates various adapters into LLMs and can execute these adapter-based PEFT methods of LLMs for different tasks. The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods. Moreover, we conduct extensive empirical studies on the impact of adapter types, placement locations, and hyper-parameters to the best design for each adapter-based methods. We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning. The results demonstrate that using adapter-based PEFT in smaller-scale LLMs (7B) with few extra trainable parameters yields comparable, and in some cases superior, performance to powerful LLMs (175B) in zero-shot inference on both reasoning tasks.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Annual Meeting of the Association for Computational Linguistics.
  2. Piqa: Reasoning about physical commonsense in natural language. In Thirty-Fourth AAAI Conference on Artificial Intelligence.
  3. Parameter-Efficient Fine-Tuning Design Spaces
  4. BoolQ: Exploring the surprising difficulty of natural yes/no questions. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2924–2936, Minneapolis, Minnesota. Association for Computational Linguistics.
  5. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
  6. Training Verifiers to Solve Math Word Problems
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  8. KronA: Parameter Efficient Tuning with Kronecker Adapter
  9. Learn-to-share: A hardware-friendly transfer learning framework exploiting computation and parameter sharing. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 3469–3479. PMLR.
  10. Towards a Unified View of Parameter-Efficient Transfer Learning
  11. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations.
  12. SparseAdapter: An easy approach for improving the parameter-efficiency of adapters. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2184–2190, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  13. Compacter: Efficient low-rank hypercomplex adapter layers. In Advances in Neural Information Processing Systems.
  14. Learning to solve arithmetic word problems with verb categorization. In EMNLP, pages 523–533.
  15. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning.
  16. LoRA: Low-Rank Adaptation of Large Language Models
  17. Large Language Models are Zero-Shot Reasoners
  18. Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics, 3:585–597.
  19. MAWPS: A math word problem repository. In Proceedings of NAACL, pages 1152–1157.
  20. The Power of Scale for Parameter-Efficient Prompt Tuning
  21. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online. Association for Computational Linguistics.
  22. Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 158–167.
  23. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.

  24. UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning
  25. Crosslingual Generalization through Multitask Finetuning
  26. OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt.

  27. GPT-4 Technical Report
  28. Are NLP models really able to solve simple math word problems? In Proceedings of NAACL, pages 2080–2094
  29. Mad-x: An adapter-based framework for multi-task cross-lingual transfer. In Conference on Empirical Methods in Natural Language Processing.
  30. Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
  31. Exploring universal intrinsic task subspace via prompt tuning. arXiv e-prints, pages arXiv–2110.
  32. Solving General Arithmetic Word Problems
  33. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64(9):99–106.
  34. SocialIQA: Commonsense Reasoning about Social Interactions
  35. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
  36. LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
  37. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.

  38. LLaMA: Open and Efficient Foundation Language Models
  39. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
  40. SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
  41. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.

  42. Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
  43. AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning
  44. ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge

Show All 44