Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis (2310.13669v1)

Published 20 Oct 2023 in cs.LG, cs.AI, cs.CL, and cs.PL

Abstract: The advent of large pre-trained LLMs in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained with a LLMling (LM) objective. In addition, the property of programming language code being precisely evaluable with respect to its semantics -- through the use of Unit Tests to check its functional correctness -- lends itself to using Reinforcement Learning (RL) as a further training paradigm. Previous work has shown that RL can be applied as such to improve models' coding capabilities; however, such RL-based methods rely on a reward signal based on defined Unit Tests, which are much harder to obtain compared to the huge crawled code datasets used in LM objectives. In this work, we present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests, suitable for RL training of Code Synthesis models. We also introduce a straightforward, simple yet effective Actor-Critic RL training scheme and show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code LLM's performance by up to 9.9% improvement over the original underlying code synthesis LM, and up to 4.3% over RL-based models trained with standard PPO or CodeRL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR), 51(4):1–37.
  2. Program synthesis with large language models. arXiv preprint arXiv:2108.07732.
  3. Richard Bellman. 1957. A markovian decision process. Indiana Univ. Math. J., 6(4):679 – 684.
  4. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  5. Codet: Code generation with generated tests. In The Eleventh International Conference on Learning Representations.
  6. Evaluating large language models trained on code. CoRR, abs/2107.03374.
  7. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  8. Pangu-coder: Program synthesis with function-level language modeling.
  9. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, volume 1, page 2.
  10. The Pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
  11. Matthew Hausknecht and Nolan Wagener. 2022. Consistent dropout for policy gradient reinforcement learning.
  12. Measuring coding challenge competence with apps. arXiv preprint arXiv:2105.09938.
  13. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
  14. Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285.
  15. Pre-trained contextual embedding of source code.
  16. Interactive code generation via test-driven user-intent formalization. arXiv preprint arXiv:2208.05950.
  17. Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35:21314–21328.
  18. Competition-level Code Generation with AlphaCode.
  19. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664.
  20. Codexglue: A machine learning benchmark dataset for code understanding and generation. CoRR, abs/2102.04664.
  21. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474.
  22. A conversational paradigm for program synthesis. arXiv preprint arXiv:2203.13474.
  23. Improving language understanding by generative pre-training. Preprint.
  24. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  25. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  26. Unsupervised translation of programming languages. Advances in Neural Information Processing Systems, 33:20601–20611.
  27. A survey of evaluation metrics used for nlg systems. ACM Computing Surveys (CSUR), 55(2):1–39.
  28. High-dimensional continuous control using generalized advantage estimation.
  29. Proximal policy optimization algorithms. CoRR, abs/1707.06347.
  30. Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
  31. Structcoder: Structure-aware transformer for code generation. arXiv preprint arXiv:2206.05239.
  32. Attention is all you need. Advances in neural information processing systems, 30.
  33. Compilable neural code generation with compiler feedback. arXiv preprint arXiv:2203.05132.
  34. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859.
  35. Fine-tuning language models from human preferences.
Citations (3)

Summary

We haven't generated a summary for this paper yet.