Emergent Mind

Abstract

This study evaluates the effectiveness of various LLMs in performing tasks common among undergraduate computer science students. Although a number of research studies in the computing education community have explored the possibility of using LLMs for a variety of tasks, there is a lack of comprehensive research comparing different LLMs and evaluating which LLMs are most effective for different tasks. Our research systematically assesses some of the publicly available LLMs such as Google Bard, ChatGPT(3.5), GitHub Copilot Chat, and Microsoft Copilot across diverse tasks commonly encountered by undergraduate computer science students in India. These tasks include code explanation and documentation, solving class assignments, technical interview preparation, learning new concepts and frameworks, and email writing. Evaluation for these tasks was carried out by pre-final year and final year undergraduate computer science students and provides insights into the models' strengths and limitations. This study aims to guide students as well as instructors in selecting suitable LLMs for any specific task and offers valuable insights on how LLMs can be used constructively by students and instructors.

Overview

  • This study compares various LLMs like Google Bard, ChatGPT, GitHub Copilot Chat, and Microsoft Copilot on tasks typical for computer science students, including code generation, project ideation, exam prep, and email writing.

  • The LLMs were evaluated through a mix of quantitative and qualitative analyses based on their effectiveness in facilitating tasks across code explanation, class assignments, technical interview prep, learning new concepts, and writing emails.

  • Key findings indicate no single LLM dominates across all tasks; Microsoft Copilot and GitHub Copilot Chat excel in code-related tasks, Google Bard is best for learning new concepts, and ChatGPT shines in writing emails.

  • The study highlights the importance of choosing the right LLM for specific tasks and suggests that future research could explore domain-specific LLMs or include emerging LLMs to keep the guide updated.

Evaluating LLMs for Undergraduate Computer Science Tasks

Introduction

The utilization of LLMs in educational contexts, particularly in undergraduate computer science programs, has gained substantial attention. This study ambitiously sets out to compare and evaluate the effectiveness of various publicly available LLMs—Google Bard, ChatGPT, GitHub Copilot Chat, and Microsoft Copilot—in facilitating tasks commonly performed by computer science students. These tasks span a range of activities, including code generation, project ideation, exam preparation, and email composition. Given the rapid expansion of LLMs and their application potential, this research offers valuable insights for students and educators in identifying the most suitable LLMs for specific educational tasks.

Methodology

The methodology employed in this study involves a mixture of quantitative and qualitative analysis of four leading LLMs across universally encountered tasks among computer science students. These tasks were rigorously evaluated by both junior and senior computer science students, encompassing:

  • Code Explanation and Documentation
  • Class Assignments across Programming, Theoretical, and Humanities contexts
  • Technical Interview Preparation
  • Learning New Concepts and Frameworks
  • Writing Emails

The LLMs were assessed based on their ability to provide clear, accurate, and helpful responses across these tasks, with performance rated on a scale from 1 to 10.

Key Findings

The study revealed that no single LLM outperforms others across all assessed tasks.

  • For code explanation and documentation, Microsoft Copilot excelled, indicating its robustness in dealing with a wide range of programming languages and presenting comprehensive code insights.
  • In class assignments, GitHub Copilot Chat led in programming assignments, leveraging its programming-centric design, whereas Microsoft Copilot was the frontrunner in both theoretical and humanities assignments, showcasing its versatility.
  • For technical interview preparation, both GitHub Copilot Chat and ChatGPT demonstrated high performance, suggesting that these models are particularly adept at solving algorithmic problems.
  • In aiding the learning of new concepts and frameworks, Google Bard emerged as the most effective, offering clear and insightful explanations that facilitate deeper understanding.
  • When it came to writing emails, ChatGPT was found to be superior, indicating its strength in generating contextually relevant and well-structured content.

Implications

This research underscores the diverse capabilities of current LLMs, suggesting that students and educators could benefit from choosing specific LLMs tailored to the needs of their tasks. It also highlights the importance of understanding the limitations and strengths of each LLM, advocating for a more informed selection process to optimize their utility in educational settings.

The findings further hint at the potential of LLMs to redefine the educational landscape, offering personalized assistance in learning new concepts, preparing for interviews, and handling assignments. However, the study also cautions against over-reliance on these models, given their varying reliability across different tasks.

Future Directions

The rapidly evolving field of LLMs promises the introduction of more advanced models. Future work could extend this research to include upcoming LLMs, offering a dynamic and updated guide for their application in education. It also opens the floor for developing domain-specific LLMs, fine-tuned to meet the nuanced requirements of educational contexts, particularly in computer science education.

Conclusion

In conclusion, this study presents a comprehensive evaluation of the performance of four major LLMs in tasks common to the undergraduate computer science curriculum. The varied performance across different tasks underscores the necessity of selecting LLMs based on the specific needs of the task at hand. As the development of LLMs continues to advance, this research provides a foundational understanding for leveraging their potential in educational settings, guiding both students and educators in their selection process.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.