CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs (2401.11314v2)
Abstract: Timely, personalized feedback is essential for students learning programming. LLM-powered tools like ChatGPT offer instant support, but reveal direct answers with code, which may hinder deep conceptual engagement. We developed CodeAid, an LLM-powered programming assistant delivering helpful, technically correct responses, without revealing code solutions. CodeAid answers conceptual questions, generates pseudo-code with line-by-line explanations, and annotates student's incorrect code with fix suggestions. We deployed CodeAid in a programming class of 700 students for a 12-week semester. A thematic analysis of 8,000 usages of CodeAid was performed, further enriched by weekly surveys, and 22 student interviews. We then interviewed eight programming educators to gain further insights. Our findings reveal four design considerations for future educational AI assistants: D1) exploiting AI's unique benefits; D2) simplifying query formulation while promoting cognitive engagement; D3) avoiding direct responses while encouraging motivated learning; and D4) maintaining transparency and control for students to asses and steer AI responses.
- Many Small Programs in CS1: Usage Analysis from Multiple Universities. In 2019 ASEE Annual Conference & Exposition ”. ASEE Conferences, Tampa, Florida, 1–13. https://peer.asee.org/33084.
- Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–13.
- Programming Is Hard-Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 500–506.
- Andrea J Bingham and Patricia Witkowsky. 2021. Deductive and inductive approaches to qualitative data analysis. Analyzing and interpreting qualitative data: After the interview (2021), 133–146.
- On the Opportunities and Risks of Foundation Models. https://doi.org/10.48550/ARXIV.2108.07258
- Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1589–1598.
- Broadening Participation in Computing via Ubiquitous Combined Majors (CS+X). In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1 (Providence, RI, USA) (SIGCSE 2022). Association for Computing Machinery, New York, NY, USA, 544–550. https://doi.org/10.1145/3478431.3499352
- Language Models are Few-shot Learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- The Future of Computing Education Materials. (2023).
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
- Jonathan E Collins. 2023. Policy Solutions: Policy questions for ChatGPT and artificial intelligence. Phi Delta Kappan 104, 7 (2023), 60–61.
- Conversing with copilot: Exploring prompt engineering for solving cs1 problems using natural language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 1136–1142.
- CodeWrite: Supporting Student-Driven Practice of Java. In Proceedings of the 42nd ACM Technical Symposium on Computer Science Education (Dallas, TX, USA) (SIGCSE ’11). Association for Computing Machinery, New York, NY, USA, 471–476. https://doi.org/10.1145/1953163.1953299
- Computing Education in the Era of Generative AI. arXiv preprint arXiv:2306.02608 (2023).
- Augie Doebling and Ayaan M Kazerouni. 2021. Patterns of academic help-seeking in undergraduate computing students. In Proceedings of the 21st Koli Calling International Conference on Computing Education Research. 1–10.
- The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Proceedings of the 24th Australasian Computing Education Conference (Virtual Event, Australia) (ACE ’22). Association for Computing Machinery, New York, NY, USA, 10–19. https://doi.org/10.1145/3511861.3511863
- My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. In Proceedings of the 25th Australasian Computing Education Conference. 97–104.
- Who Uses Office Hours? A Comparison of In-Person and Virtual Office Hours Utilization. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1 (Providence, RI, USA) (SIGCSE 2022). Association for Computing Machinery, New York, NY, USA, 300–306. https://doi.org/10.1145/3478431.3499334
- Philip J Guo. 2013. Online python tutor: embeddable web-based program visualization for cs education. In Proceeding of the 44th ACM technical symposium on Computer science education. 579–584.
- Mark Guzdial. 2023. Scaffolding to Support Humanities Students Programming in a Human Language Context. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 2 (Turku, Finland) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 579–580. https://doi.org/10.1145/3587103.3594157
- What would other programmers do: suggesting solutions to error messages. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1019–1028.
- Writing reusable code feedback at scale with mixed-initiative program synthesis. In Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale. 89–98.
- Hotjar. 2023. Hotjar: Website Heatmaps & Behavior Analytics Tools. https://www.hotjar.com/. Accessed on: 30-July-2023.
- Jeremy Hsu. 2023. Should schools ban AI chatbots? New Scientist 257, 3422 (2023), 15. https://doi.org/10.1016/S0262-4079(23)00099-4
- Michelle Ichinco and Caitlin Kelleher. 2015. Exploring novice programmer example use. In 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 63–71.
- ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences 103 (2023), 102274.
- Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–23.
- How Novices Use LLM-based Code Generators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment. In Proceedings of the 23rd Koli Calling International Conference on Computing Education Research.
- QuickTA: Exploring the Design Space of Using Large Language Models to Provide Support to Students. Learning Analytics and Knowledge Conference 2023 (LAK’23).
- Sam Lau and Philip J Guo. 2023. From” Ban It Till We Understand It” to” Resistance is Futile”: How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools such as ChatGPT and GitHub Copilot. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 1.
- Michael J. Lee. 2014. Gidget: An online debugging game for learning and engagement in computing education. In 2014 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 193–194. https://doi.org/10.1109/VLHCC.2014.6883051
- Michael J Lee and Amy J Ko. 2015. Comparing the effectiveness of online learning approaches on CS1 learning outcomes. In Proceedings of the eleventh annual international conference on international computing education research. 237–246.
- Comparing Code Explanations Created by Students and Large Language Models. arXiv:2304.03938 [cs.CY]
- Using large language models to enhance programming error messages. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 563–569.
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS’20). Curran Associates Inc., Red Hook, NY, USA, Article 793, 16 pages.
- Towards a Framework for Teaching Debugging. In Proceedings of the Twenty-First Australasian Computing Education Conference (Sydney, NSW, Australia) (ACE ’19). Association for Computing Machinery, New York, NY, USA, 79–86. https://doi.org/10.1145/3286960.3286970
- CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes. arXiv:2308.06921 [cs.CY]
- Dastyni Loksa and Amy J Ko. 2016. The role of self-regulation in programming problem solving process and success. In Proceedings of the 2016 ACM conference on international computing education research. 83–91.
- Experiences from using code explanations generated by large language models in a web software development e-book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 931–937.
- Subgoal-labeled instructional material improves performance and transfer in learning to develop mobile applications. In Proceedings of the ninth annual international conference on International computing education research. 71–78.
- Debugging: a review of the literature from an educational perspective. Computer Science Education 18, 2 (2008), 67–92. https://doi.org/10.1080/08993400802114581 arXiv:https://doi.org/10.1080/08993400802114581
- Tilman Michaeli and Ralf Romeike. 2019. Improving Debugging Skills in the Classroom: The Effects of Teaching a Systematic Debugging Process. In Proceedings of the 14th Workshop in Primary and Secondary Computing Education (Glasgow, Scotland, Uk) (WiPSCE’19). Association for Computing Machinery, New York, NY, USA, Article 15, 7 pages. https://doi.org/10.1145/3361721.3361724
- Matthew B Miles and A Michael Huberman. 1994. Qualitative data analysis: An expanded sourcebook. sage.
- Undergraduate teaching assistants in computer science: a systematic literature review. In Proceedings of the 2019 ACM Conference on International Computing Education Research. 31–40.
- Ability to ’explain in Plain English’ Linked to Proficiency in Computer-Based Programming. In Proceedings of the Ninth Annual International Conference on International Computing Education Research (Auckland, New Zealand) (ICER ’12). Association for Computing Machinery, New York, NY, USA, 111–118. https://doi.org/10.1145/2361276.2361299
- Assessing and responding to the growth of computer science undergraduate enrollments. National Academies Press.
- Kimberly A Neuendorf. 2017. The content analysis guidebook. sage.
- OpenAI. 2022. Introducing ChatGPT. {https://openai.com/blog/chatgpt}. Accessed on: 30-July-2023.
- Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. arXiv preprint arXiv:2302.04662 (2023).
- What Do We Think We Think We Are Doing? Metacognition and Self-Regulation in Programming. In Proceedings of the 2020 ACM Conference on International Computing Education Research (Virtual Event, New Zealand) (ICER ’20). Association for Computing Machinery, New York, NY, USA, 2–13. https://doi.org/10.1145/3372782.3406263
- “It’s Weird That It Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. 31, 1, Article 4 (nov 2023), 31 pages. https://doi.org/10.1145/3617367
- Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (Turku, Finland) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 299–305. https://doi.org/10.1145/3587102.3588805
- Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1. 27–43.
- Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 1.
- My Digital Hand: A Tool for Scaling Up One-to-One Peer Teaching in Support of Computer Science Learning. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (Seattle, Washington, USA) (SIGCSE ’17). Association for Computing Machinery, New York, NY, USA, 549–554. https://doi.org/10.1145/3017680.3017800
- ”Office Hours Are Kind of Weird”: Reclaiming a Resource to Foster Student-Faculty Interaction. InSight: A Journal of Scholarly Teaching 12 (2017), 14–29.
- A comparative study of free self-explanations and socratic tutoring explanations for source code comprehension. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 219–225.
- Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.
- Benefits of self-explanation in introductory programming. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education. 284–289.
- Lev Semenovich Vygotsky and Michael Cole. 1978. Mind in society: Development of higher psychological processes. Harvard university press.
- Towards mutual theory of mind in human-ai interaction: How language reflects what students perceive about a virtual teaching assistant. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1–14.
- An Australasian Study of Reading and Comprehension Skills in Novice Programmers, Using the Bloom and SOLO Taxonomies. In Proc. of the 8th Australasian Conf. on Computing Education - Volume 52. Australian Computer Society, Inc., AUS, 243–252.
- A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. 1–10.