"Which LLM should I use?": Evaluating LLMs for tasks performed by Undergraduate Computer Science Students (2402.01687v2)
Abstract: This study evaluates the effectiveness of various LLMs in performing tasks common among undergraduate computer science students. Although a number of research studies in the computing education community have explored the possibility of using LLMs for a variety of tasks, there is a lack of comprehensive research comparing different LLMs and evaluating which LLMs are most effective for different tasks. Our research systematically assesses some of the publicly available LLMs such as Google Bard, ChatGPT(3.5), GitHub Copilot Chat, and Microsoft Copilot across diverse tasks commonly encountered by undergraduate computer science students in India. These tasks include code explanation and documentation, solving class assignments, technical interview preparation, learning new concepts and frameworks, and email writing. Evaluation for these tasks was carried out by pre-final year and final year undergraduate computer science students and provides insights into the models' strengths and limitations. This study aims to guide students as well as instructors in selecting suitable LLMs for any specific task and offers valuable insights on how LLMs can be used constructively by students and instructors.
- [n. d.]. https://alphacode.deepmind.com/
- [n. d.]. Code Generator - Amazon CodeWhisperer - AWS. https://aws.amazon.com/codewhisperer/
- [n. d.]. Codex. https://openai.com/blog/openai-codex
- [n. d.]. GPT-3 powers the next generation of apps. https://openai.com/blog/gpt-3-apps
- 2024a. Dataset used for code explanation task. https://github.com/whatisthereinthename/comparing_llm_data/blob/main/code_explanation_and_documentation.pdf
- 2024b. Dataset used for evaluating LLMs. https://github.com/whatisthereinthename/comparing_llm_data
- 2024c. Dataset used for learning new concepts and frameworks. https://github.com/whatisthereinthename/comparing_llm_data/blob/main/learning_new_concepts%26frameworks.pdf
- 2024d. Dataset used for writing emails task. https://github.com/whatisthereinthename/comparing_llm_data/blob/main/email_writing.xlsx
- 2024. neetcode.io. https://neetcode.io/practice
- Investigating the Potential of GPT-3 in Providing Feedback for Programming Assessments. https://doi.org/10.1145/3587102.3588852
- Programming Is Hard - Or at Least It Used to Be. https://doi.org/10.1145/3545945.3569759
- “It’s not like Jarvis, but it’s pretty close!” - Examining ChatGPT’s Usage among Undergraduate Students in Computer Science. In Proceedings of the 26th Australasian Computing Education Conference (, Sydney, NSW, Australia,) (ACE ’24). Association for Computing Machinery, New York, NY, USA, 124–133. https://doi.org/10.1145/3636243.3636257
- Adapting Large Language Models via Reading Comprehension. arXiv:2309.09530 [cs.CL]
- Bruno Pereira Cipriano and Pedro Alves. 2023. GPT-3 vs Object Oriented Programming Assignments: An Experience Report. https://doi.org/10.1145/3587102.3588814
- Marian Daun and Jennifer Brings. 2023. How ChatGPT Will Change Software Engineering Education. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (<conf-loc>, <city>Turku</city>, <country>Finland</country>, </conf-loc>) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 110–116. https://doi.org/10.1145/3587102.3588815
- Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. https://doi.org/10.1145/3545945.3569823
- Thomas Dohmke. 2023. GitHub COPILOT X: The AI-Powered Developer experience. https://github.blog/2023-03-22-github-copilot-x-the-ai-powered-developer-experience/
- The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. https://doi.org/10.1145/3511861.3511863
- My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. https://doi.org/10.1145/3576123.3576134
- Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its Nature, Scope, Limits, and Consequences. https://doi.org/10.1007/s11023-020-09548-1
- Krystal Hu. 2023. CHATGPT sets record for fastest-growing user base - analyst note. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/
- ChatGPT in the Classroom: An Analysis of Its Strengths and Weaknesses for Solving Undergraduate Computer Science Questions. arXiv:2304.14993 [cs.HC]
- "With Great Power Comes Great Responsibility!": Student and Instructor Perspectives on the influence of LLMs on Undergraduate Engineering Education. arXiv:2309.10694 [cs.HC]
- Sam Lau and Philip J. Guo. 2023. From "Ban It Till We Understand It" to "Resistance is Futile": How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools such as ChatGPT and GitHub Copilot. In Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1, ICER 2023, Chicago, IL, USA, August 7-11, 2023, Kathi Fisler, Paul Denny, Diana Franklin, and Margaret Hamilton (Eds.). ACM, 106–121. https://doi.org/10.1145/3568813.3600138
- Leetcode. [n. d.]. The world’s leading online programming learning platform. https://leetcode.com/
- Comparing Code Explanations Created by Students and Large Language Models. https://doi.org/10.1145/3587102.3588785
- Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book. https://doi.org/10.1145/3545945.3569785
- On the Educational Impact of ChatGPT: Is Artificial Intelligence Ready to Obtain a University Degree? https://doi.org/10.1145/3587102.3588827
- Yusuf Mehdi. 2023. Reinventing search with a new AI-powered Microsoft Bing and EDGE, your copilot for the web. https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/
- ChatGPT, Can You Generate Solutions for my Coding Exercises? An Evaluation on its Effectiveness in an undergraduate Java Programming Course. https://doi.org/10.1145/3587102.3588794
- Sundar Pichai. 2023. An important next step on our ai journey. https://blog.google/technology/ai/bard-google-ai-search-updates/
- Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations. https://doi.org/10.1145/3587102.3588805
- Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. https://doi.org/10.1145/3501385.3543957
- Llama 2 is here - get it on hugging face. https://huggingface.co/blog/llama2
- Michel Wermelinger. 2023. Using GitHub Copilot to Solve Simple Programming Problems. https://doi.org/10.1145/3545945.3569830
- Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses? https://doi.org/10.1145/3587102.3588792