Let's Ask AI About Their Programs: Exploring ChatGPT's Answers To Program Comprehension Questions (2404.11734v1)
Abstract: Recent research has explored the creation of questions from code submitted by students. These Questions about Learners' Code (QLCs) are created through program analysis, exploring execution paths, and then creating code comprehension questions from these paths and the broader code structure. Responding to the questions requires reading and tracing the code, which is known to support students' learning. At the same time, computing education researchers have witnessed the emergence of LLMs that have taken the community by storm. Researchers have demonstrated the applicability of these models especially in the introductory programming context, outlining their performance in solving introductory programming problems and their utility in creating new learning resources. In this work, we explore the capability of the state-of-the-art LLMs (GPT-3.5 and GPT-4) in answering QLCs that are generated from code that the LLMs have created. Our results show that although the state-of-the-art LLMs can create programs and trace program execution when prompted, they easily succumb to similar errors that have previously been recorded for novice programmers. These results demonstrate the fallibility of these models and perhaps dampen the expectations fueled by the recent LLM hype. At the same time, we also highlight future research possibilities such as using LLMs to mimic students as their behavior can indeed be similar for some specific tasks.
- Programming Is Hard - Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 500–506. https://doi.org/10.1145/3545945.3569759
- Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 1136–1142. https://doi.org/10.1145/3545945.3569823
- Computing Education in the Era of Generative AI. arXiv:2306.02608 [cs.CY]
- Robosourcing Educational Resources – Leveraging Large Language Models for Learnersourcing. arXiv:2211.04715 [cs.HC]
- The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Proceedings of the 24th Australasian Computing Education Conference (Virtual Event, Australia) (ACE ’22). Association for Computing Machinery, New York, NY, USA, 10–19. https://doi.org/10.1145/3511861.3511863
- My AI Wants to Know If This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. In Proceedings of the 25th Australasian Computing Education Conference (Melbourne, VIC, Australia) (ACE ’23). Association for Computing Machinery, New York, NY, USA, 97–104. https://doi.org/10.1145/3576123.3576134
- Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests. In Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1 (Chicago, IL, USA) (ICER ’23). Association for Computing Machinery, New York, NY, USA, 93–105. https://doi.org/10.1145/3568813.3600139
- Fostering Program Comprehension in Novice Programmers - Learning Activities and Learning Trajectories. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education (Aberdeen, Scotland Uk) (ITiCSE-WGR ’19). Association for Computing Machinery, New York, NY, USA, 27–52. https://doi.org/10.1145/3344429.3372501
- ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences 103 (2023), 102274. https://doi.org/10.1016/j.lindif.2023.102274
- Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 455, 23 pages. https://doi.org/10.1145/3544548.3580919
- Cazembe Kennedy and Eileen T. Kraemer. 2019. Qualitative Observations of Student Reasoning: Coding in the Wild. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education (Aberdeen, Scotland Uk) (ITiCSE ’19). Association for Computing Machinery, New York, NY, USA, 224–230. https://doi.org/10.1145/3304221.3319751
- A Systematic Literature Review of Automated Feedback Generation for Programming Exercises. ACM Trans. Comput. Educ. 19, 1, Article 3 (sep 2018), 43 pages. https://doi.org/10.1145/3231711
- Learnersourcing in the age of AI: Student, educator and machine partnerships for content creation. Computers and Education: Artificial Intelligence 5 (2023), 100151. https://doi.org/10.1016/j.caeai.2023.100151
- Amruth N. Kumar. 2013. A Study of the Influence of Code-Tracing Problems on Code-Writing Skills. In Proceedings of the 18th ACM Conference on Innovation and Technology in Computer Science Education (Canterbury, England, UK) (ITiCSE ’13). Association for Computing Machinery, New York, NY, USA, 183–188. https://doi.org/10.1145/2462476.2462507
- Amruth N. Kumar. 2015. Solving Code-Tracing Problems and Its Effect on Code-Writing Skills Pertaining to Program Semantics. In Proceedings of the 2015 ACM Conference on Innovation and Technology in Computer Science Education (Vilnius, Lithuania) (ITiCSE ’15). Association for Computing Machinery, New York, NY, USA, 314–319. https://doi.org/10.1145/2729094.2742587
- Automated Questionnaires About Students’ JavaScript Programs: Towards Gauging Novice Programming Processes. In Proceedings of the 25th Australasian Computing Education Conference (Melbourne, VIC, Australia) (ACE ’23). Association for Computing Machinery, New York, NY, USA, 49–58. https://doi.org/10.1145/3576123.3576129
- Students Struggle to Explain Their Own Program Code. In Proceedings of the 26th ACM Conference on on Innovation and Technology in Computer Science Education V. 1 (Virtual Event, Germany) (ITiCSE ’21). Association for Computing Machinery, New York, NY, USA, 206–212. https://doi.org/10.1145/3430665.3456322
- Let’s Ask Students About Their Programs, Automatically. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, New York, NY, USA, 467–475. https://doi.org/10.1109/ICPC52881.2021.00054
- Automated Questions About Learners’ Own Code Help to Detect Fragile Prerequisite Knowledge. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (Turku, Finland) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 505–511. https://doi.org/10.1145/3587102.3588787
- Comparing Code Explanations Created by Students and Large Language Models. In Proceedings of the 28th ACM Conference on on Innovation and Technology in Computer Science Education Vol. 1 (Turku, Finland) (ITiCSE ’23). Association for Computing Machinery, New York, NY, USA, 7 pages. https://doi.org/10.1145/3587102.3588785
- Using Large Language Models to Enhance Programming Error Messages. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 563–569. https://doi.org/10.1145/3545945.3569770
- Raymond Lister. 2000. On Blooming First Year Programming, and Its Blooming Assessment. In Proceedings of the Australasian Conference on Computing Education (Melbourne, Australia) (ACSE ’00). Association for Computing Machinery, New York, NY, USA, 158–162. https://doi.org/10.1145/359369.359393
- Raymond Lister. 2011. Concrete and other neo-Piagetian forms of reasoning in the novice programmer. In Proceedings of the Thirteenth Australasian Computing Education Conference - Volume 114 (Perth, Australia) (ACE ’11). Australian Computer Society, Inc., AUS, 9–18.
- Naturally occurring data as research instrument: analyzing examination responses to study the novice programmer. SIGCSE Bull. 41, 4 (jan 2010), 156–173. https://doi.org/10.1145/1709424.1709460
- Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 55, 9, Article 195 (jan 2023), 35 pages. https://doi.org/10.1145/3560815
- Relationships between Reading, Tracing and Writing Skills in Introductory Programming. In Proceedings of the Fourth International Workshop on Computing Education Research (Sydney, Australia) (ICER ’08). Association for Computing Machinery, New York, NY, USA, 101–112. https://doi.org/10.1145/1404520.1404531
- Is AI the better programming partner? Human-Human Pair Programming vs. Human-AI pAIr Programming. arXiv:2306.05153 [cs.HC]
- The Implications of Large Language Models for CS Teachers and Students. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 2 (Toronto ON, Canada) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 1255. https://doi.org/10.1145/3545947.3573358
- Experiences from Using Code Explanations Generated by Large Language Models in a Web Software Development E-Book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 931–937. https://doi.org/10.1145/3545945.3569785
- Automatically Generating CS Learning Materials with Large Language Models. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 2 (Toronto ON, Canada) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 1176. https://doi.org/10.1145/3545947.3569630
- Generating Diverse Code Explanations Using the GPT-3 Large Language Model. In Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 2 (Lugano and Virtual Event, Switzerland) (ICER ’22). Association for Computing Machinery, New York, NY, USA, 37–39. https://doi.org/10.1145/3501709.3544280
- On Faithfulness and Factuality in Abstractive Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 1906–1919. https://doi.org/10.18653/v1/2020.acl-main.173
- A Multi-National, Multi-Institutional Study of Assessment of Programming Skills of First-Year CS Students. In Working Group Reports from ITiCSE on Innovation and Technology in Computer Science Education (Canterbury, UK) (ITiCSE-WGR ’01). Association for Computing Machinery, New York, NY, USA, 125–180. https://doi.org/10.1145/572133.572137
- OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
- Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. In Proceedings of the 16th International Conference on Educational Data Mining. International Educational Data Mining Society, Massachusetts, MA, USA, 370–377. https://doi.org/10.5281/zenodo.8115653
- Transformed by Transformers: Navigating the AI Coding Revolution for Computing Education: An ITiCSE Working Group Conducted by Humans. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 2 (Turku, Finland) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 561–562. https://doi.org/10.1145/3587103.3594206
- “It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. 31, 1, Article 4 (nov 2023), 31 pages. https://doi.org/10.1145/3617367
- Ruixiang Qi and Davide Fossati. 2020. Unlimited Trace Tutor: Learning Code Tracing With Automatically Generated Programs. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (Portland, OR, USA) (SIGCSE ’20). Association for Computing Machinery, New York, NY, USA, 427–433. https://doi.org/10.1145/3328778.3366939
- Exploring ChatGPT’s Impact on Post-Secondary Education: A Qualitative Study. In Proceedings of the 25th Western Canadian Conference on Computing Education (Vancouver, BC, Canada) (WCCCE ’23). Association for Computing Machinery, New York, NY, USA, Article 9, 6 pages. https://doi.org/10.1145/3593342.3593360
- Arun Raman and Viraj Kumar. 2022. Programming Pedagogy and Assessment in the Era of AI/ML: A Position Paper. In Proceedings of the 15th Annual ACM India Compute Conference (Jaipur, India) (COMPUTE ’22). Association for Computing Machinery, New York, NY, USA, 29–34. https://doi.org/10.1145/3561833.3561843
- Jean Salac and Diana Franklin. 2020. If They Build It, Will They Understand It? Exploring the Relationship between Student Code and Performance. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education (Trondheim, Norway) (ITiCSE ’20). Association for Computing Machinery, New York, NY, USA, 473–479. https://doi.org/10.1145/3341525.3387379
- Jask: Generation of Questions About Learners’ Code in Java. In Proceedings of the 27th ACM Conference on on Innovation and Technology in Computer Science Education Vol. 1 (Dublin, Ireland) (ITiCSE ’22). Association for Computing Machinery, New York, NY, USA, 117–123. https://doi.org/10.1145/3502718.3524761
- Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proceedings of the 2022 ACM Conference on International Computing Education Research - Volume 1 (Lugano and Virtual Event, Switzerland) (ICER ’22). Association for Computing Machinery, New York, NY, USA, 27–43. https://doi.org/10.1145/3501385.3543957
- Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses. In Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1 (Chicago, IL, USA) (ICER ’23). Association for Computing Machinery, New York, NY, USA, 78–92. https://doi.org/10.1145/3568813.3600142
- Can Generative Pre-Trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (Turku, Finland) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 117–123. https://doi.org/10.1145/3587102.3588792
- Carsten Schulte. 2008. Block Model: An Educational Model of Program Comprehension as a Tool for a Scholarly Approach to Teaching. In Proceedings of the Fourth International Workshop on Computing Education Research (Sydney, Australia) (ICER ’08). Association for Computing Machinery, New York, NY, USA, 149–160. https://doi.org/10.1145/1404520.1404535
- An Introduction to Program Comprehension for Computer Science Educators. In Proceedings of the 2010 ITiCSE Working Group Reports (Ankara, Turkey) (ITiCSE-WGR ’10). Association for Computing Machinery, New York, NY, USA, 65–86. https://doi.org/10.1145/1971681.1971687
- Surely we must learn to read before we learn to write!. In Proceedings of the Eleventh Australasian Conference on Computing Education - Volume 95 (Wellington, New Zealand) (ACE ’09). Australian Computer Society, Inc., AUS, 165–170.
- Juha Sorva and Teemu Sirkiä. 2015. Embedded Questions in Ebooks on Programming: Useful for a) Summative Assessment, b) Formative Assessment, or c) Something Else?. In Proceedings of the 15th Koli Calling Conference on Computing Education Research (Koli, Finland) (Koli Calling ’15). Association for Computing Machinery, New York, NY, USA, 152–156. https://doi.org/10.1145/2828959.2828961
- Automated assessment in CS1. In Proceedings of the 8th Australasian Conference on Computing Education - Volume 52 (Hobart, Australia) (ACE ’06). Australian Computer Society, Inc., AUS, 223–228.
- Des Traynor and J. Paul Gibson. 2005. Synthesis and analysis of automatic assessment methods in CS1: generating intelligent MCQs. SIGCSE Bull. 37, 1 (feb 2005), 495–499. https://doi.org/10.1145/1047124.1047502
- Benefits of Self-Explanation in Introductory Programming. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education (Kansas City, Missouri, USA) (SIGCSE ’15). Association for Computing Machinery, New York, NY, USA, 284–289. https://doi.org/10.1145/2676723.2677260
- Michel Wermelinger. 2023. Using GitHub Copilot to Solve Simple Programming Problems. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 172–178. https://doi.org/10.1145/3545945.3569830
- The many ways of the BRACElet project. Bulletin of Applied Computing and Information Technology 1 (2007), 1–16.
- A theory of instruction for introductory programming skills. Computer Science Education 29, 2-3 (2019), 205–253. https://doi.org/10.1080/08993408.2019.1565235