PerfRL: A Small Language Model Framework for Efficient Code Optimization (2312.05657v2)
Abstract: Code optimization is a challenging task requiring a substantial level of expertise from developers. Nonetheless, this level of human capacity is not sufficient considering the rapid evolution of new hardware architectures and software environments. In light of this, recent research proposes adopting machine learning and artificial intelligence techniques to automate the code optimization process. In this paper, we introduce PerfRL, an innovative framework designed to tackle the problem of code optimization. Our framework leverages the capabilities of small LLMs (SLMs) and reinforcement learning (RL), facilitating a system where SLMs can assimilate feedback from their environment during the fine-tuning phase, notably through unit tests. When benchmarked against existing models, PerfRL demonstrates superior efficiency in terms of speed and computational resource usage, attributed to its reduced need for training steps and its compatibility with SLMs. Furthermore, it substantially diminishes the risk of logical and syntactical errors. To evaluate our framework, we conduct experiments on the PIE dataset using a lightweight LLM (i.e., CodeT5) and a new reinforcement learning algorithm, namely RRHF. For evaluation purposes, we use a list of evaluation metrics related to optimization quality and speedup. The evaluation results show that our approach achieves similar or better results compared to state-of-the-art models using shorter training times and smaller pre-trained models.
- Training a helpful and harmless assistant with reinforcement learning from human feedback, 2022.
- Learning to superoptimize programs. CoRR, abs/1611.01787, 2016. URL http://arxiv.org/abs/1611.01787.
- Evaluating large language models trained on code, 2021.
- Programl: Graph-based deep learning for program optimization and analysis, 2020.
- The three pillars of machine programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pp. 69–80, 2018a.
- The three pillars of machine programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2018, pp. 69–80, New York, NY, USA, 2018b. Association for Computing Machinery. ISBN 9781450358347. doi: 10.1145/3211346.3211355. URL https://doi.org/10.1145/3211346.3211355.
- Measuring coding challenge competence with apps. NeurIPS, 2021.
- CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436, 2019.
- CodeRL: Mastering code generation through pretrained models and deep reinforcement learning. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=WaGvb7OzySA.
- Rltf: Reinforcement learning from unit test feedback, 2023.
- Codexglue: A machine learning benchmark dataset for code understanding and generation. CoRR, abs/2102.04664, 2021.
- Learning performance-improving code edits. arXiv preprint arXiv:2302.07867, 2023.
- Codegen2: Lessons for training llms on programming and natural languages. ICLR, 2023a.
- Codegen: An open large language model for code with multi-turn program synthesis. ICLR, 2023b.
- Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks, 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
- Proximal policy optimization algorithms, 2017.
- Pangu-coder2: Boosting large language models for code with ranking feedback, 2023.
- Execution-based code generation using deep reinforcement learning, 2023.
- Preference ranking optimization for human alignment, 2023.
- Learning to summarize from human feedback. CoRR, abs/2009.01325, 2020. URL https://arxiv.org/abs/2009.01325.
- Attention is all you need. CoRR, abs/1706.03762, 2017. URL http://arxiv.org/abs/1706.03762.
- CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 8696–8708, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.685. URL https://aclanthology.org/2021.emnlp-main.685.
- Reinforcement learning from diverse human preferences, 2023.
- Rrhf: Rank responses to align language models with human feedback without tears, 2023.
- Siren’s song in the ai ocean: A survey on hallucination in large language models, 2023.