Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models (2401.00788v1)
Abstract: The high cost of full-parameter fine-tuning (FFT) of LLMs has led to a series of parameter-efficient fine-tuning (PEFT) methods. However, it remains unclear which methods provide the best cost-performance trade-off at different model scales. We introduce Astraios, a suite of 28 instruction-tuned OctoCoder models using 7 tuning methods and 4 model sizes up to 16 billion parameters. Through investigations across 5 tasks and 8 different datasets encompassing both code comprehension and code generation tasks, we find that FFT generally leads to the best downstream performance across all scales, and PEFT methods differ significantly in their efficacy based on the model scale. LoRA usually offers the most favorable trade-off between cost and performance. Further investigation into the effects of these methods on both model robustness and code security reveals that larger models tend to demonstrate reduced robustness and less security. At last, we explore the relationships among updated parameters, cross-entropy loss, and task performance. We find that the tuning effectiveness observed in small models generalizes well to larger models, and the validation loss in instruction tuning can be a reliable indicator of overall downstream performance.
- Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7319–7328.
- Scaling laws for generative mixed-modal language models. arXiv preprint arXiv:2301.03728.
- SantaCoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403.
- Is github’s copilot as bad as humans at introducing vulnerabilities in code? Empirical Software Engineering, 28(6):1–24.
- Qwen technical report. arXiv preprint arXiv:2309.16609.
- A framework for the evaluation of code generation models. https://github.com/bigcode-project/bigcode-evaluation-harness.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR.
- Pavol Bielik and Martin Vechev. 2020. Adversarial robustness for code. In International Conference on Machine Learning, pages 896–907. PMLR.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Sahil Chaudhary. 2023. Code Alpaca: An Instruction-following LLaMA model for code generation. https://github.com/sahil280114/codealpaca.
- Revisiting Parameter-Efficient Tuning: Are We Really There Yet? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2612–2626.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Github copilot ai pair programmer: Asset or liability? Journal of Systems and Software, 203:111734.
- Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale. Advances in Neural Information Processing Systems, 35:30318–30332.
- Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv preprint arXiv:2203.06904.
- Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence, 5(3):220–235.
- Krona: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650.
- InCoder: A Generative Model for Code Infilling and Synthesis. In The Eleventh International Conference on Learning Representations.
- On the effectiveness of parameter-efficient fine-tuning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 12799–12807.
- Robust Transfer Learning with Pretrained Language Models through Adapters. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 854–861.
- Towards a Unified View of Parameter-Efficient Transfer Learning. In International Conference on Learning Representations.
- Parameter-efficient model adaptation for vision transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 817–825.
- Scaling laws for autoregressive generative modeling. arXiv preprint arXiv:2010.14701.
- Semantic robustness of models of source code. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 526–537. IEEE.
- Scaling laws for transfer. arXiv preprint arXiv:2102.01293.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
- Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620.
- Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, pages 2790–2799. PMLR.
- LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations.
- LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models. arXiv preprint arXiv:2304.01933.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.
- Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems, 34:1022–1035.
- The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059.
- StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597.
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965.
- GPT understands, too. AI Open.
- CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
- WizardCoder: Empowering Code Large Language Models with Evol-Instruct. arXiv preprint arXiv:2306.08568.
- PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods. https://github.com/huggingface/peft.
- Acquisition of chess knowledge in alphazero. Proceedings of the National Academy of Sciences, 119(47):e2206625119.
- Inverse Scaling: When Bigger Isn’t Better. arXiv preprint arXiv:2306.09479.
- Octopack: Instruction tuning code large language models. arXiv preprint arXiv:2308.07124.
- Scaling Data-Constrained Language Models. arXiv preprint arXiv:2305.16264.
- Crosslingual Generalization through Multitask Finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15991–16111, Toronto, Canada. Association for Computational Linguistics.
- CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Understanding the Effectiveness of Large Language Models in Code Translation. arXiv preprint arXiv:2308.03109.
- Asleep at the keyboard? assessing the security of github copilot’s code contributions. In 2022 IEEE Symposium on Security and Privacy (SP), pages 754–768. IEEE.
- True few-shot learning with language models. Advances in Neural Information Processing Systems, 34:11054–11070.
- MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7654–7673.
- Language models are unsupervised multitask learners.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
- Pangu-coder2: Boosting large language models for code with ranking feedback. arXiv preprint arXiv:2307.14936.
- Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5227–5237.
- Jeffrey Svajlenko and Chanchal K Roy. 2021. Bigclonebench. Code Clone Analysis: Research, Tools, and Practices, pages 93–105.
- Memorization without overfitting: Analyzing the training dynamics of large language models. Advances in Neural Information Processing Systems, 35:38274–38290.
- Sergey Troshin and Nadezhda Chirkova. 2022. Probing Pretrained Models of Source Codes. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 371–383.
- What do they capture? a structural analysis of pre-trained language models for source code. In Proceedings of the 44th International Conference on Software Engineering, pages 2377–2388.
- No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 382–394.
- ReCode: Robustness Evaluation of Code Generation Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13818–13843, Toronto, Canada. Association for Computational Linguistics.
- Codet5+: Open code large language models for code understanding and generation. arXiv preprint arXiv:2305.07922.
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708.
- Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning. In The Eleventh International Conference on Learning Representations.
- Inverse scaling can become u-shaped. arXiv preprint arXiv:2211.02011.
- Skywork: A more open bilingual foundation model. arXiv preprint arXiv:2310.19341.
- Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359.
- Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery.
- Chunqiu Steven Xia and Lingming Zhang. 2023. Conversational automated program repair. arXiv preprint arXiv:2301.13246.
- Training Trajectories of Language Models Across Scales. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13711–13738, Toronto, Canada. Association for Computational Linguistics.
- OpenAgents: An Open Platform for Language Agents in the Wild. CoRR, abs/2310.10634.
- Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models. arXiv preprint arXiv:2311.00871.
- BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1–9.
- Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning. In The Eleventh International Conference on Learning Representations.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5673–5684.
- Making parameter-efficient tuning more efficient: A unified framework for classification tasks. In Proceedings of the 29th International Conference on Computational Linguistics, pages 7053–7064.
- Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Advances in neural information processing systems, 32.
- Pop Quiz! Do Pre-trained Code Models Possess Knowledge of Correct API Names? arXiv preprint arXiv:2309.07804.
- Red teaming chatgpt via jailbreaking: Bias, robustness, reliability and toxicity. arXiv preprint arXiv:2301.12867, pages 12–2.
- Data Augmentation Approaches for Source Code Models: A Survey. arXiv preprint arXiv:2305.19915.
- Terry Yue Zhuo (32 papers)
- Armel Zebaze (8 papers)
- Nitchakarn Suppattarachai (1 paper)
- Leandro von Werra (19 papers)
- Harm de Vries (29 papers)
- Qian Liu (252 papers)
- Niklas Muennighoff (56 papers)