Software Testing with Large Language Models: Survey, Landscape, and Vision (2307.07221v3)
Abstract: Pre-trained LLMs have recently emerged as a breakthrough technology in natural language processing and artificial intelligence, with the ability to handle large-scale datasets and exhibit remarkable performance across a wide range of tasks. Meanwhile, software testing is a crucial undertaking that serves as a cornerstone for ensuring the quality and reliability of software products. As the scope and complexity of software systems continue to grow, the need for more effective software testing techniques becomes increasingly urgent, making it an area ripe for innovative approaches such as the use of LLMs. This paper provides a comprehensive review of the utilization of LLMs in software testing. It analyzes 102 relevant studies that have used LLMs for software testing, from both the software testing and LLMs perspectives. The paper presents a detailed discussion of the software testing tasks for which LLMs are commonly used, among which test case preparation and program repair are the most representative. It also analyzes the commonly used LLMs, the types of prompt engineering that are employed, as well as the accompanied techniques with these LLMs. It also summarizes the key challenges and potential opportunities in this direction. This work can serve as a roadmap for future research in this area, highlighting potential avenues for exploration, and identifying gaps in our current understanding of the use of LLMs in software testing.
- M. Harman and P. McMinn, “A theoretical and empirical study of search-based testing: Local, global, and hybrid search,” vol. 36, no. 2, 2010, pp. 226–247.
- P. Delgado-Pérez, A. Ramírez, K. J. Valle-Gómez, I. Medina-Bulo, and J. R. Romero, “Interevo-tr: Interactive evolutionary test generation with readability assessment,” IEEE Trans. Software Eng., vol. 49, no. 4, pp. 2580–2596, 2023.
- X. Xiao, S. Li, T. Xie, and N. Tillmann, “Characteristic studies of loop problems for structural test generation via symbolic execution,” in 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013, Silicon Valley, CA, USA, November 11-15, 2013, E. Denney, T. Bultan, and A. Zeller, Eds. IEEE, 2013, pp. 246–256.
- C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball, “Feedback-directed random test generation,” in 29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, USA, May 20-26, 2007. IEEE Computer Society, 2007, pp. 75–84.
- Z. Yuan, Y. Lou, M. Liu, S. Ding, K. Wang, Y. Chen, and X. Peng, “No more manual tests? evaluating and improving chatgpt for unit test generation,” arXiv preprint arXiv:2305.04207, 2023.
- Y. Tang, Z. Liu, Z. Zhou, and X. Luo, “Chatgpt vs SBST: A comparative assessment of unit test suite generation,” CoRR, vol. abs/2307.00588, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2307.00588
- A. Developers, “Ui/application exerciser monkey,” 2012.
- Y. Li, Z. Yang, Y. Guo, and X. Chen, “Droidbot: a lightweight ui-guided test input generator for android,” in ICSE. IEEE, 2017.
- T. Su, G. Meng, Y. Chen, K. Wu, W. Yang, Y. Yao, G. Pu, Y. Liu, and Z. Su, “Guided, stochastic model-based gui testing of android apps,” in Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017, pp. 245–256.
- Z. Dong, M. Böhme, L. Cojocaru, and A. Roychoudhury, “Time-travel testing of android apps,” in ICSE. IEEE, 2020.
- M. Pan, A. Huang, G. Wang, T. Zhang, and X. Li, “Reinforcement learning based curiosity-driven testing of android applications,” in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020, pp. 153–164.
- Z. Liu, C. Chen, J. Wang, M. Chen, B. Wu, X. Che, D. Wang, and Q. Wang, “Make LLM a testing expert: Bringing human-like interaction to mobile GUI testing via functionality-aware decisions,” CoRR, vol. abs/2310.15780, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.15780
- T. Su, J. Wang, and Z. Su, “Benchmarking automated GUI testing for android against real-world bugs,” in ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021. ACM, 2021, pp. 119–130.
- M. Shanahan, “Talking about large language models,” CoRR, vol. abs/2212.03551, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2212.03551
- W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, Y. Du, C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J. Nie, and J. Wen, “A survey of large language models,” CoRR, vol. abs/2303.18223, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2303.18223
- T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” in NeurIPS, 2022. [Online]. Available: http://papers.nips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in NeurIPS, 2022. [Online]. Available: http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html
- J. Li, G. Li, Y. Li, and Z. Jin, “Structured chain-of-thought prompting for code generation,” 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:258615421
- J. Li, Y. Li, G. Li, Z. Jin, Y. Hao, and X. Hu, “Skcoder: A sketch-based approach for automatic code generation,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023, pp. 2124–2135.
- J. Li, Y. Zhao, Y. Li, G. Li, and Z. Jin, “Acecoder: Utilizing existing code to enhance code generation,” 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257901190
- Y. Dong, X. Jiang, Z. Jin, and G. Li, “Self-collaboration code generation via chatgpt,” CoRR, vol. abs/2304.07590, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.07590
- S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, and X. Wu, “Unifying large language models and knowledge graphs: A roadmap,” CoRR, vol. abs/2306.08302, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.08302
- M. Tufano, D. Drain, A. Svyatkovskiy, S. K. Deng, and N. Sundaresan, “Unit test case generation with transformers and focal context,” arXiv preprint arXiv:2009.05617, 2020.
- B. Chen, F. Zhang, A. Nguyen, D. Zan, Z. Lin, J.-G. Lou, and W. Chen, “Codet: Code generation with generated tests,” arXiv preprint arXiv:2207.10397, 2022.
- S. K. Lahiri, A. Naik, G. Sakkas, P. Choudhury, C. von Veh, M. Musuvathi, J. P. Inala, C. Wang, and J. Gao, “Interactive code generation via test-driven user-intent formalization,” arXiv preprint arXiv:2208.05950, 2022.
- S. Alagarsamy, C. Tantithamthavorn, and A. Aleti, “A3test: Assertion-augmented automated test case generation,” arXiv preprint arXiv:2302.10352, 2023.
- M. Schäfer, S. Nadi, A. Eghbali, and F. Tip, “An empirical evaluation of using large language models for automated unit test generation,” IEEE Transactions on Software Engineering, pp. 1–21, 2023.
- V. Guilherme and A. Vincenzi, “An initial investigation of chatgpt unit test generation capability,” in 8th Brazilian Symposium on Systematic and Automated Software Testing, SAST 2023, Campo Grande, MS, Brazil, September 25-29, 2023, A. L. Fontão, D. M. B. Paiva, H. Borges, M. I. Cagnin, P. G. Fernandes, V. Borges, S. M. Melo, V. H. S. Durelli, and E. D. Canedo, Eds. ACM, 2023, pp. 15–24. [Online]. Available: https://doi.org/10.1145/3624032.3624035
- S. Hashtroudi, J. Shin, H. Hemmati, and S. Wang, “Automated test case generation using code models and domain adaptation,” CoRR, vol. abs/2308.08033, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.08033
- L. Plein, W. C. Ouédraogo, J. Klein, and T. F. Bissyandé, “Automatic generation of test cases based on bug reports: a feasibility study with large language models,” CoRR, vol. abs/2310.06320, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.06320
- V. Vikram, C. Lemieux, and R. Padhye, “Can large language models write good property-based tests?” CoRR, vol. abs/2307.04346, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2307.04346
- N. Rao, K. Jain, U. Alon, C. L. Goues, and V. J. Hellendoorn, “CAT-LM training language models on aligned code and tests,” in 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 2023, pp. 409–420. [Online]. Available: https://doi.org/10.1109/ASE56229.2023.00193
- Z. Xie, Y. Chen, C. Zhi, S. Deng, and J. Yin, “Chatunitest: a chatgpt-based automated unit test generation tool,” arXiv preprint arXiv:2305.04764, 2023.
- C. Lemieux, J. P. Inala, S. K. Lahiri, and S. Sen, “Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models,” in International conference on software engineering (ICSE), 2023.
- A. M. Dakhel, A. Nikanjam, V. Majdinasab, F. Khomh, and M. C. Desmarais, “Effective test generation using pre-trained large language models and mutation testing,” CoRR, vol. abs/2308.16557, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.16557
- M. L. Siddiq, J. Santos, R. H. Tanvir, N. Ulfat, F. A. Rifat, and V. C. Lopes, “Exploring the effectiveness of large language models in generating unit tests,” arXiv preprint arXiv:2305.00418, 2023.
- Y. Zhang, W. Song, Z. Ji, D. Yao, and N. Meng, “How well does LLM generate security tests?” CoRR, vol. abs/2310.00710, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.00710
- V. Li and N. Doiron, “Prompting code interpreter to write better unit tests on quixbugs functions,” CoRR, vol. abs/2310.00483, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.00483
- B. Steenhoek, M. Tufano, N. Sundaresan, and A. Svyatkovskiy, “Reinforcement learning from automatic feedback for high-quality unit test generation,” 2023.
- S. Bhatia, T. Gandhi, D. Kumar, and P. Jalote, “Unit test generation using generative ai : A comparative performance analysis of autogeneration tools,” 2023.
- M. Tufano, D. Drain, A. Svyatkovskiy, and N. Sundaresan, “Generating accurate assert statements for unit test cases using pretrained transformers,” in Proceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test, 2022, pp. 54–64.
- P. Nie, R. Banerjee, J. J. Li, R. J. Mooney, and M. Gligoric, “Learning deep semantics for test completion,” arXiv preprint arXiv:2302.10166, 2023.
- A. Mastropaolo, N. Cooper, D. Nader-Palacio, S. Scalabrino, D. Poshyvanyk, R. Oliveto, and G. Bavota, “Using transfer learning for code-related tasks,” IEEE Trans. Software Eng., vol. 49, no. 4, pp. 1580–1598, 2023. [Online]. Available: https://doi.org/10.1109/TSE.2022.3183297
- N. Nashid, M. Sintaha, and A. Mesbah, “Retrieval-based prompt selection for code-related few-shot learning,” in Proceedings of the 45th International Conference on Software Engineering (ICSE’23), 2023.
- G. Ye, Z. Tang, S. H. Tan, S. Huang, D. Fang, X. Sun, L. Bian, H. Wang, and Z. Wang, “Automated conformance testing for javascript engines via deep compiler fuzzing,” in Proceedings of the 42nd ACM SIGPLAN international conference on programming language design and implementation, 2021, pp. 435–450.
- Z. Liu, C. Chen, J. Wang, X. Che, Y. Huang, J. Hu, and Q. Wang, “Fill in the blank: Context-aware automated text input generation for mobile gui testing,” arXiv preprint arXiv:2212.04732, 2022.
- M. R. Taesiri, F. Macklon, Y. Wang, H. Shen, and C.-P. Bezemer, “Large language models are pretty good zero-shot video game bug detectors,” arXiv preprint arXiv:2210.02506, 2022.
- S. L. Shrestha and C. Csallner, “Slgpt: using transfer learning to directly generate simulink model files and find bugs in the simulink toolchain,” in Evaluation and Assessment in Software Engineering, 2021, pp. 260–265.
- J. Hu, Q. Zhang, and H. Yin, “Augmenting greybox fuzzing with generative AI,” CoRR, vol. abs/2306.06782, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.06782
- A. Mathur, S. Pradhan, P. Soni, D. Patel, and R. Regunathan, “Automated test case generation using t5 and gpt-3,” in 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, 2023, pp. 1986–1992.
- D. Zimmermann and A. Koziolek, “Automating gui-based software testing with gpt-3,” in 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 2023, pp. 62–65.
- M. Taeb, A. Swearngin, E. Schoop, R. Cheng, Y. Jiang, and J. Nichols, “Axnav: Replaying accessibility tests from natural language,” CoRR, vol. abs/2310.02424, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.02424
- Q. Luu, H. Liu, and T. Y. Chen, “Can chatgpt advance software testing intelligence? an experience report on metamorphic testing,” CoRR, vol. abs/2310.19204, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.19204
- A. Khanfir, R. Degiovanni, M. Papadakis, and Y. L. Traon, “Efficient mutation testing via pre-trained language models,” arXiv preprint arXiv:2301.03543, 2023.
- Y. Deng, C. S. Xia, C. Yang, S. D. Zhang, S. Yang, and L. Zhang, “Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt,” arXiv preprint arXiv:2304.02014, 2023.
- ——, “Large language models are zero shot fuzzers: Fuzzing deep learning libraries via large language models,” arXiv preprint arXiv:2209.11515, 2023.
- J. Ackerman and G. Cybenko, “Large language models for fuzzing parsers (registered report),” in Proceedings of the 2nd International Fuzzing Workshop, FUZZING 2023, Seattle, WA, USA, 17 July 2023, M. Böhme, Y. Noller, B. Ray, and L. Szekeres, Eds. ACM, 2023, pp. 31–38. [Online]. Available: https://doi.org/10.1145/3605157.3605173
- S. Yu, C. Fang, Y. Ling, C. Wu, and Z. Chen, “LLM for test script generation and migration: Challenges, capabilities, and opportunities,” CoRR, vol. abs/2309.13574, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2309.13574
- G. Deng, Y. Liu, V. M. Vilches, P. Liu, Y. Li, Y. Xu, T. Zhang, Y. Liu, M. Pinzger, and S. Rass, “Pentestgpt: An llm-empowered automatic penetration testing tool,” CoRR, vol. abs/2308.06782, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.06782
- M. Sun, Y. Yang, Y. Wang, M. Wen, H. Jia, and Y. Zhou, “SMT solver validation empowered by large pre-trained language models,” in 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 2023, pp. 1288–1300. [Online]. Available: https://doi.org/10.1109/ASE56229.2023.00180
- Y. Deng, J. Yao, Z. Tu, X. Zheng, M. Zhang, and T. Zhang, “Target: Automated scenario generation from traffic rules for testing autonomous vehicles,” 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:258588387
- Z. Liu, C. Chen, J. Wang, M. Chen, B. Wu, X. Che, D. Wang, and Q. Wang, “Testing the limits: Unusual text inputs generation for mobile app crash detection with large language model,” CoRR, vol. abs/2310.15657, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.15657
- C. Zhang, M. Bai, Y. Zheng, Y. Li, X. Xie, Y. Li, W. Ma, L. Sun, and Y. Liu, “Understanding large language model based fuzz driver generation,” CoRR, vol. abs/2307.12469, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2307.12469
- C. Xia, M. Paltenghi, J. Tian, M. Pradel, and L. Zhang, “Universal fuzzing via large language models,” ArXiv, vol. abs/2308.04748, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:260735598
- C. Tsigkanos, P. Rani, S. Müller, and T. Kehrer, “Variable discovery with large language models for metamorphic testing of scientific software,” in Computational Science - ICCS 2023 - 23rd International Conference, Prague, Czech Republic, July 3-5, 2023, Proceedings, Part I, ser. Lecture Notes in Computer Science, J. Mikyska, C. de Mulatier, M. Paszynski, V. V. Krzhizhanovskaya, J. J. Dongarra, and P. M. A. Sloot, Eds., vol. 14073. Springer, 2023, pp. 321–335. [Online]. Available: https://doi.org/10.1007/978-3-031-35995-8_23
- C. Yang, Y. Deng, R. Lu, J. Yao, J. Liu, R. Jabbarvand, and L. Zhang, “White-box compiler fuzzing empowered by large language models,” CoRR, vol. abs/2310.15991, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.15991
- T. Zhang, I. C. Irsan, F. Thung, D. Han, D. Lo, and L. Jiang, “itiger: an automatic issue title generation tool,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 1637–1641.
- Y. Huang, J. Wang, Z. Liu, Y. Wang, S. Wang, C. Chen, Y. Hu, and Q. Wang, “Crashtranslator: Automatically reproducing mobile application crashes directly from stack trace,” CoRR, vol. abs/2310.07128, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.07128
- T. Zhang, I. C. Irsan, F. Thung, and D. Lo, “Cupid: Leveraging chatgpt for more accurate duplicate bug report detection,” CoRR, vol. abs/2308.10022, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.10022
- U. Mukherjee and M. M. Rahman, “Employing deep learning and structured information retrieval to answer clarification questions on bug reports,” 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:259501524
- P. Mahbub, O. Shuvo, and M. M. Rahman, “Explaining software bugs leveraging code structures in neural machine translation,” arXiv preprint arXiv:2212.04584, 2022.
- S. Feng and C. Chen, “Prompting is all your need: Automated android bug replay with large language models,” CoRR, vol. abs/2306.01987, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.01987
- Y. Su, Z. Han, Z. Gao, Z. Xing, Q. Lu, and X. Xu, “Still confusing for bug-component triaging? deep feature learning and ensemble setting to rescue,” in 31st IEEE/ACM International Conference on Program Comprehension, ICPC 2023, Melbourne, Australia, May 15-16, 2023. IEEE, 2023, pp. 316–327. [Online]. Available: https://doi.org/10.1109/ICPC58990.2023.00046
- N. D. Bui, Y. Wang, and S. Hoi, “Detect-localize-repair: A unified framework for learning to debug with codet5,” arXiv preprint arXiv:2211.14875, 2022.
- S. Kang, J. Yoon, and S. Yoo, “Large language models are few-shot testers: Exploring llm-based general bug reproduction,” arXiv preprint arXiv:2209.11515, 2022.
- S. Kang, G. An, and S. Yoo, “A preliminary evaluation of llm-based fault localization,” CoRR, vol. abs/2308.05487, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.05487
- P. Widjojo and C. Treude, “Addressing compiler errors: Stack overflow or large language models?” CoRR, vol. abs/2307.10793, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2307.10793
- L. Plein and T. F. Bissyandé, “Can llms demystify bug reports?” CoRR, vol. abs/2310.06310, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.06310
- A. Taylor, A. Vassar, J. Renzella, and H. A. Pearce, “Dcc –help: Generating context-aware compiler error explanations with large language models,” 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:261076439
- S. Kang, B. Chen, S. Yoo, and J.-G. Lou, “Explainable automated debugging via large language model-driven scientific debugging,” arXiv preprint arXiv:2304.02195, 2023.
- A. Z. H. Yang, R. Martins, C. L. Goues, and V. J. Hellendoorn, “Large language models for test-free fault localization,” CoRR, vol. abs/2310.01726, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.01726
- Y. Wu, Z. Li, J. M. Zhang, M. Papadakis, M. Harman, and Y. Liu, “Large language models in fault localisation,” CoRR, vol. abs/2308.15276, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.15276
- H. Tu, Z. Zhou, H. Jiang, I. N. B. Yusuf, Y. Li, and L. Jiang, “LLM4CBI: taming llms to generate effective test programs for compiler bug isolation,” CoRR, vol. abs/2307.00593, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2307.00593
- T.-O. Li, W. Zong, Y. Wang, H. Tian, Y. Wang, S.-C. Cheung, and J. Kramer, “Nuances are the key: Unlocking chatgpt to find failure-inducing tests with differential prompting,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2023, pp. 14–26.
- X. Chen, M. Lin, N. Schärli, and D. Zhou, “Teaching large language models to self-debug,” CoRR, vol. abs/2304.05128, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.05128
- J. Cao, M. Li, M. Wen, and S.-c. Cheung, “A study on prompt design, advantages and limitations of chatgpt for deep learning program repair,” arXiv preprint arXiv:2304.08191, 2023.
- H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Examining zero-shot vulnerability repair with large language models,” in 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 2022, pp. 1–18.
- Z. Fan, X. Gao, A. Roychoudhury, and S. H. Tan, “Automated repair of programs from large language models,” arXiv preprint arXiv:2205.10583, 2022.
- Y. Hu, X. Shi, Q. Zhou, and L. Pike, “Fix bugs with transformer through a neural-symbolic edit grammar,” arXiv preprint arXiv:2204.06643, 2022.
- C. S. Xia, Y. Wei, and L. Zhang, “Practical program repair in the era of large pre-trained language models,” arXiv preprint arXiv:2210.14179, 2022.
- J. Zhang, J. Cambronero, S. Gulwani, V. Le, R. Piskac, G. Soares, and G. Verbruggen, “Repairing bugs in python assignments using large language models,” arXiv preprint arXiv:2209.14876, 2022.
- M. Lajkó, V. Csuvik, and L. Vidács, “Towards javascript program repair with generative pre-trained transformer (gpt-2),” in Proceedings of the Third International Workshop on Automated Program Repair, 2022, pp. 61–68.
- D. Sobania, M. Briesch, C. Hanna, and J. Petke, “An analysis of the automatic bug fixing performance of chatgpt,” arXiv preprint arXiv:2301.08653, 2023.
- K. Huang, X. Meng, J. Zhang, Y. Liu, W. Wang, S. Li, and Y. Zhang, “An empirical study on fine-tuning large language models of code for automated program repair,” in 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 2023, pp. 1162–1174. [Online]. Available: https://doi.org/10.1109/ASE56229.2023.00181
- M. C. Wuisang, M. Kurniawan, K. A. Wira Santosa, A. Agung Santoso Gunawan, and K. E. Saputra, “An evaluation of the effectiveness of openai’s chatgpt for automated python program bug fixing using quixbugs,” in 2023 International Seminar on Application for Technology of Information and Communication (iSemantic), 2023, pp. 295–300.
- D. Horváth, V. Csuvik, T. Gyimóthy, and L. Vidács, “An extensive study on model architecture and program representation in the domain of learning-based automated program repair,” in IEEE/ACM International Workshop on Automated Program Repair, APR@ICSE 2023, Melbourne, Australia, May 16, 2023. IEEE, 2023, pp. 31–38. [Online]. Available: https://doi.org/10.1109/APR59189.2023.00013
- J. A. Prenner, H. Babii, and R. Robbes, “Can openai’s codex fix bugs? an evaluation on quixbugs,” in Proceedings of the Third International Workshop on Automated Program Repair, 2022, pp. 69–75.
- W. Yuan, Q. Zhang, T. He, C. Fang, N. Q. V. Hung, X. Hao, and H. Yin, “Circle: continual repair across programming languages,” in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022, pp. 678–690.
- S. Moon, Y. Song, H. Chae, D. Kang, T. Kwon, K. T. iunn Ong, S. won Hwang, and J. Yeo, “Coffee: Boost your code llms by fixing bugs with feedback,” 2023.
- Y. Wei, C. S. Xia, and L. Zhang, “Copiloting the copilots: Fusing large language models with completion engines for automated program repair,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, S. Chandra, K. Blincoe, and P. Tonella, Eds. ACM, 2023, pp. 172–184. [Online]. Available: https://doi.org/10.1145/3611643.3616271
- Y. Peng, S. Gao, C. Gao, Y. Huo, and M. R. Lyu, “Domain knowledge matters: Improving prompts with fix templates for repairing python type errors,” CoRR, vol. abs/2306.01394, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.01394
- A. E. I. Brownlee, J. Callan, K. Even-Mendoza, A. Geiger, C. Hanna, J. Petke, F. Sarro, and D. Sobania, “Enhancing genetic improvement mutations using large language models,” in Search-Based Software Engineering - 15th International Symposium, SSBSE 2023, San Francisco, CA, USA, December 8, 2023, Proceedings, ser. Lecture Notes in Computer Science, P. Arcaini, T. Yue, and E. M. Fredericks, Eds., vol. 14415. Springer, 2023, pp. 153–159. [Online]. Available: https://doi.org/10.1007/978-3-031-48796-5_13
- M. M. A. Haque, W. U. Ahmad, I. Lourentzou, and C. Brown, “Fixeval: Execution-based evaluation of program fixes for programming problems,” in IEEE/ACM International Workshop on Automated Program Repair, APR@ICSE 2023, Melbourne, Australia, May 16, 2023. IEEE, 2023, pp. 11–18. [Online]. Available: https://doi.org/10.1109/APR59189.2023.00009
- B. Ahmad, S. Thakur, B. Tan, R. Karri, and H. Pearce, “Fixing hardware security bugs with large language models,” arXiv preprint arXiv:2302.01215, 2023.
- P. Deligiannis, A. Lal, N. Mehrotra, and A. Rastogi, “Fixing rust compilation errors using llms,” CoRR, vol. abs/2308.05177, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.05177
- F. Ribeiro, R. Abreu, and J. Saraiva, “Framing program repair as code completion,” in Proceedings of the Third International Workshop on Automated Program Repair, 2022, pp. 38–45.
- N. Wadhwa, J. Pradhan, A. Sonwane, S. P. Sahu, N. Natarajan, A. Kanade, S. Parthasarathy, and S. K. Rajamani, “Frustrated with code quality issues? llms can help!” CoRR, vol. abs/2309.12938, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2309.12938
- F. Ribeiro, J. N. C. de Macedo, K. Tsushima, R. Abreu, and J. Saraiva, “Gpt-3-powered type error debugging: Investigating the use of large language models for code repair,” in Proceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering, SLE 2023, Cascais, Portugal, October 23-24, 2023, J. Saraiva, T. Degueule, and E. Scott, Eds. ACM, 2023, pp. 111–124. [Online]. Available: https://doi.org/10.1145/3623476.3623522
- Y. Wu, N. Jiang, H. V. Pham, T. Lutellier, J. Davis, L. Tan, P. Babkin, and S. Shah, “How effective are neural networks for fixing security vulnerabilities,” arXiv preprint arXiv:2305.18607, 2023.
- N. Jiang, K. Liu, T. Lutellier, and L. Tan, “Impact of code language models on automated program repair,” arXiv preprint arXiv:2302.05020, 2023.
- M. Jin, S. Shahriar, M. Tufano, X. Shi, S. Lu, N. Sundaresan, and A. Svyatkovskiy, “Inferfix: End-to-end program repair with llms,” arXiv preprint arXiv:2303.07263, 2023.
- C. S. Xia and L. Zhang, “Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using chatgpt,” arXiv preprint arXiv:2304.00385, 2023.
- Y. Zhang, G. Li, Z. Jin, and Y. Xing, “Neural program repair with program dependence analysis and effective filter mechanism,” arXiv preprint arXiv:2305.09315, 2023.
- J. A. Prenner and R. Robbes, “Out of context: How important is local context in neural program repair?” 2023.
- Q. Zhang, C. Fang, B. Yu, W. Sun, T. Zhang, and Z. Chen, “Pre-trained model-based automated software vulnerability repair: How far are we?” CoRR, vol. abs/2308.12533, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.12533
- S. Garg, R. Z. Moghaddam, and N. Sundaresan, “Rapgen: An approach for fixing code inefficiencies in zero-shot,” CoRR, vol. abs/2306.17077, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.17077
- W. Wang, Y. Wang, S. Joty, and S. C. H. Hoi, “Rap-gen: Retrieval-augmented patch generation with codet5 for automatic program repair,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, S. Chandra, K. Blincoe, and P. Tonella, Eds. ACM, 2023, pp. 146–158. [Online]. Available: https://doi.org/10.1145/3611643.3616256
- Y. Zhang, Z. Jin, Y. Xing, and G. Li, “STEAM: simulating the interactive behavior of programmers for automatic bug fixing,” CoRR, vol. abs/2308.14460, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.14460
- S. Fakhoury, S. Chakraborty, M. Musuvathi, and S. K. Lahiri, “Towards generating functionally correct code edits from natural language issue descriptions,” arXiv preprint arXiv:2304.03816, 2023.
- M. Fu, C. Tantithamthavorn, T. Le, V. Nguyen, and D. Phung, “Vulrepair: a t5-based automated software vulnerability repair,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 935–947.
- S. Gao, X. Wen, C. Gao, W. Wang, H. Zhang, and M. R. Lyu, “What makes good in-context demonstrations for code intelligence tasks with llms?” in 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 2023, pp. 761–773. [Online]. Available: https://doi.org/10.1109/ASE56229.2023.00109
- C. Treude and H. Hata, “She elicits requirements and he tests: Software engineering gender bias in large language models,” CoRR, vol. abs/2303.10131, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2303.10131
- R. Kocielnik, S. Prabhumoye, V. Zhang, R. M. Alvarez, and A. Anandkumar, “Autobiastest: Controllable sentence generation for automated and open-ended social bias testing in language models,” CoRR, vol. abs/2302.07371, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2302.07371
- M. Ciniselli, L. Pascarella, and G. Bavota, “To what extent do deep learning-based code recommenders generate predictions by cloning code from the training set?” in 19th IEEE/ACM International Conference on Mining Software Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022. ACM, 2022, pp. 167–178. [Online]. Available: https://doi.org/10.1145/3524842.3528440
- D. Erhabor, S. Udayashankar, M. Nagappan, and S. Al-Kiswany, “Measuring the runtime performance of code produced with github copilot,” CoRR, vol. abs/2305.06439, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2305.06439
- R. Wang, R. Cheng, D. Ford, and T. Zimmermann, “Investigating and designing for trust in ai-powered code generation tools,” CoRR, vol. abs/2305.11248, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2305.11248
- B. Yetistiren, I. Özsoy, M. Ayerdem, and E. Tüzün, “Evaluating the code quality of ai-assisted code generation tools: An empirical study on github copilot, amazon codewhisperer, and chatgpt,” CoRR, vol. abs/2304.10778, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.10778
- C. Wohlin, “Guidelines for snowballing in systematic literature studies and a replication in software engineering,” in 18th International Conference on Evaluation and Assessment in Software Engineering, EASE ’14, London, England, United Kingdom, May 13-14, 2014, M. J. Shepperd, T. Hall, and I. Myrtveit, Eds. ACM, 2014, pp. 38:1–38:10. [Online]. Available: https://doi.org/10.1145/2601248.2601268
- A. Mastropaolo, S. Scalabrino, N. Cooper, D. Nader-Palacio, D. Poshyvanyk, R. Oliveto, and G. Bavota, “Studying the usage of text-to-text transfer transformer to support code-related tasks,” in 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 2021, pp. 336–347.
- C. Tsigkanos, P. Rani, S. Müller, and T. Kehrer, “Large language models: The next frontier for variable discovery within metamorphic testing?” in IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023, Taipa, Macao, March 21-24, 2023, T. Zhang, X. Xia, and N. Novielli, Eds. IEEE, 2023, pp. 678–682. [Online]. Available: https://doi.org/10.1109/SANER56733.2023.00070
- S. Lukasczyk and G. Fraser, “Pynguin: Automated unit test generation for python,” in 44th IEEE/ACM International Conference on Software Engineering: Companion Proceedings, ICSE Companion 2022, Pittsburgh, PA, USA, May 22-24, 2022. ACM/IEEE, 2022, pp. 168–172. [Online]. Available: https://doi.org/10.1145/3510454.3516829
- E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo, “The oracle problem in software testing: A survey,” IEEE transactions on software engineering, vol. 41, no. 5, pp. 507–525, 2014.
- C. Watson, M. Tufano, K. Moran, G. Bavota, and D. Poshyvanyk, “On learning meaningful assert statements for unit test cases,” in ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, G. Rothermel and D. Bae, Eds. ACM, 2020, pp. 1398–1409.
- Y. He, L. Zhang, Z. Yang, Y. Cao, K. Lian, S. Li, W. Yang, Z. Zhang, M. Yang, Y. Zhang, and H. Duan, “Textexerciser: Feedback-driven text input exercising for android applications,” in 2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, May 18-21, 2020. IEEE, 2020, pp. 1071–1087.
- A. Wei, Y. Deng, C. Yang, and L. Zhang, “Free lunch for testing: Fuzzing deep-learning libraries from open source,” in 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2022, pp. 995–1007.
- D. Xie, Y. Li, M. Kim, H. V. Pham, L. Tan, X. Zhang, and M. W. Godfrey, “Docter: documentation-guided fuzzing for testing deep learning API functions,” in ISSTA ’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18 - 22, 2022, S. Ryu and Y. Smaragdakis, Eds. ACM, 2022, pp. 176–188.
- Q. Guo, X. Xie, Y. Li, X. Zhang, Y. Liu, X. Li, and C. Shen, “Audee: Automated testing for deep learning frameworks,” in 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020. IEEE, 2020, pp. 486–498.
- Z. Wang, M. Yan, J. Chen, S. Liu, and D. Zhang, “Deep learning library testing via effective model generation,” in ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, P. Devanbu, M. B. Cohen, and T. Zimmermann, Eds. ACM, 2020, pp. 788–799.
- J. Jiang, Y. Xiong, H. Zhang, Q. Gao, and X. Chen, “Shaping program repair space with existing patches and similar code,” in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2018. New York, NY, USA: Association for Computing Machinery, 2018, p. 298–309. [Online]. Available: https://doi.org/10.1145/3213846.3213871
- M. Wen, J. Chen, R. Wu, D. Hao, and S.-C. Cheung, “Context-aware patch generation for better automated program repair,” in Proceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18. New York, NY, USA: Association for Computing Machinery, 2018, p. 1–11. [Online]. Available: https://doi.org/10.1145/3180155.3180233
- Y. Xiong, J. Wang, R. Yan, J. Zhang, S. Han, G. Huang, and L. Zhang, “Precise condition synthesis for program repair,” in 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017, pp. 416–426.
- J. Xuan, M. Martinez, F. DeMarco, M. Clément, S. L. Marcote, T. Durieux, D. Le Berre, and M. Monperrus, “Nopol: Automatic repair of conditional statement bugs in java programs,” IEEE Transactions on Software Engineering, vol. 43, no. 1, pp. 34–55, 2017.
- S. Song, X. Li, and S. Li, “How to bridge the gap between modalities: A comprehensive survey on multimodal large language model,” CoRR, vol. abs/2311.07594, 2023.
- J. M. Zhang, M. Harman, L. Ma, and Y. Liu, “Machine learning testing: Survey, landscapes and horizons,” IEEE Trans. Software Eng., vol. 48, no. 2, pp. 1–36, 2022.
- F. Tu, J. Zhu, Q. Zheng, and M. Zhou, “Be careful of when: an empirical study on time-related misuse of issue tracking data,” in Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018, G. T. Leavens, A. Garcia, and C. S. Pasareanu, Eds. ACM, 2018, pp. 307–318. [Online]. Available: https://doi.org/10.1145/3236024.3236054
- Z. Sun, L. Li, Y. Liu, X. Du, and L. Li, “On the importance of building high-quality training datasets for neural code search,” in 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2022, pp. 1609–1620. [Online]. Available: https://doi.org/10.1145/3510003.3510160
- L. Shi, Z. Jiang, Y. Yang, X. Chen, Y. Zhang, F. Mu, H. Jiang, and Q. Wang, “ISPY: automatic issue-solution pair extraction from community live chats,” in 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, 2021, pp. 142–154. [Online]. Available: https://doi.org/10.1109/ASE51524.2021.9678894
- D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, N. Duan, A. Svyatkovskiy, S. Fu, M. Tufano, S. K. Deng, C. B. Clement, D. Drain, N. Sundaresan, J. Yin, D. Jiang, and M. Zhou, “Graphcodebert: Pre-training code representations with data flow,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. [Online]. Available: https://openreview.net/forum?id=jLoC4ez43PZ
- F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao, “Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv preprint arXiv:1506.03365, 2015.
- LoadRunner, Inc., “Loadrunner,” 2023, microfocus.com.
- LangChain, Inc., “Langchain,” 2023, https://docs.langchain.com/docs/.
- Prompt engineering, “Prompt engineering guide,” 2023, https://github.com/dair-ai/Prompt-Engineering-Guide.
- Z. Zhang, A. Zhang, M. Li, H. Zhao, G. Karypis, and A. Smola, “Multimodal chain-of-thought reasoning in language models,” CoRR, vol. abs/2302.00923, 2023.
- Z. Liu, X. Yu, Y. Fang, and X. Zhang, “Graphprompt: Unifying pre-training and downstream tasks for graph neural networks,” in Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, Y. Ding, J. Tang, J. F. Sequeda, L. Aroyo, C. Castillo, and G. Houben, Eds. ACM, 2023, pp. 417–428.
- Y. Charalambous, N. Tihanyi, R. Jain, Y. Sun, M. A. Ferrag, and L. C. Cordeiro, “A new era in software security: Towards self-healing software via large language models and formal verification,” 2023.
- S. Wang, L. Huang, A. Gao, J. Ge, T. Zhang, H. Feng, I. Satyarth, M. Li, H. Zhang, and V. Ng, “Machine/deep learning for software engineering: A systematic literature review,” IEEE Trans. Software Eng., vol. 49, no. 3, pp. 1188–1231, 2023. [Online]. Available: https://doi.org/10.1109/TSE.2022.3173346
- Y. Yang, X. Xia, D. Lo, and J. C. Grundy, “A survey on deep learning for software engineering,” ACM Comput. Surv., vol. 54, no. 10s, pp. 206:1–206:73, 2022. [Online]. Available: https://doi.org/10.1145/3505243
- C. Watson, N. Cooper, D. Nader-Palacio, K. Moran, and D. Poshyvanyk, “A systematic literature review on the use of deep learning in software engineering research,” ACM Trans. Softw. Eng. Methodol., vol. 31, no. 2, pp. 32:1–32:58, 2022. [Online]. Available: https://doi.org/10.1145/3485275
- M. Bajammal, A. Stocco, D. Mazinanian, and A. Mesbah, “A survey on the use of computer vision to improve software engineering tasks,” IEEE Trans. Software Eng., vol. 48, no. 5, pp. 1722–1742, 2022. [Online]. Available: https://doi.org/10.1109/TSE.2020.3032986
- X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. C. Grundy, and H. Wang, “Large language models for software engineering: A systematic literature review,” CoRR, vol. abs/2308.10620, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.10620
- A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, and J. M. Zhang, “Large language models for software engineering: Survey and open problems,” CoRR, vol. abs/2310.03533, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.03533
- D. Zan, B. Chen, F. Zhang, D. Lu, B. Wu, B. Guan, Y. Wang, and J. Lou, “Large language models meet nl2code: A survey,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A. Rogers, J. L. Boyd-Graber, and N. Okazaki, Eds. Association for Computational Linguistics, 2023, pp. 7443–7464. [Online]. Available: https://doi.org/10.18653/v1/2023.acl-long.411