Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities (2312.15006v2)
Abstract: This study critically evaluates the efficacy of prompting methods in enhancing the mathematical reasoning capability of LLMs. The investigation uses three prescriptive prompting methods - simple, persona, and conversational prompting - known for their effectiveness in enhancing the linguistic tasks of LLMs. We conduct this analysis on OpenAI's LLM chatbot, ChatGPT-3.5, on extensive problem sets from the MATH, GSM8K, and MMLU datasets, encompassing a broad spectrum of mathematical challenges. A grading script adapted to each dataset is used to determine the effectiveness of these prompting interventions in enhancing the model's mathematical analysis power. Contrary to expectations, our empirical analysis reveals that none of the investigated methods consistently improves over ChatGPT-3.5's baseline performance, with some causing significant degradation. Our findings suggest that prompting strategies do not necessarily generalize to new domains, in this study failing to enhance mathematical performance.
- Measuring mathematical problem solving with the math dataset, 2021.
- Training verifiers to solve math word problems, 2021.
- Measuring massive multitask language understanding, 2021.
- Chain-of-thought prompting elicits reasoning in large language models, 2023.
- Ernest Davis. Mathematics, word problems, common sense, and artificial intelligence, 2023.
- An independent evaluation of chatgpt on mathematical word problems (mwp), 2023.
- Learning gain differences between chatgpt and human tutor generated algebra hints, 2023.