Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving (2405.12205v1)
Abstract: Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems.
- Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- Mathematical discoveries from program search with large language models. Nature, pages 1–3, 2023.
- Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
- Evaluation of large language models for discovery of gene set function. Research Square, 2023.
- Solving olympiad geometry without human demonstrations. Nature, 625(7995):476–482, 2024.
- Learning to solve arithmetic word problems with verb categorization. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 523–533, 2014.
- Reasoning about quantities in natural language. Transactions of the Association for Computational Linguistics, 3:1–13, 2015.
- Solving general arithmetic word problems. arXiv preprint arXiv:1608.01413, 2016.
- Are nlp models really able to solve simple math word problems? arXiv preprint arXiv:2103.07191, 2021.
- Measuring mathematical problem solving with the math dataset, 2021.
- Solving math word problems by combining language models with symbolic solvers. arXiv preprint arXiv:2304.09102, 2023.
- JH Flavell. Metacognitive aspects of problem solving. In The Nature of Intelligence. Routledge, 1976.
- Intelligent tutoring systems. In M. Helander T. Landauer and P. Prabhu, editors, Handbook of Human Computer Interaction, pages 849–874. Elsevier Science, Amsterdam, 1997.
- The knowledge-learning-instruction framework: Bridging the science-practice chasm to enhance robust student learning, 2012.
- Chain-of-thought prompting elicits reasoning in large language models. arXiv, abs/2201.11903, 2022.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- How to train data-efficient llms. arXiv preprint arXiv:2402.09668, 2024.
- Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR, 2023.
- Learning factors analysis: A general method for cognitive model evaluation and improvement. In M. Ikeda, K. Ashley, and T. Chan, editors, Intelligent Tutoring Systems (volume 4053 of Lec. Notes in Comp. Sci.), pages 164–175. 2006.
- Automatic discovery of cognitive skills to improve the prediction of student learning. Advances in neural information processing systems, 27, 2014.
- Emma Brunskill. Estimating prerequisite structure from noisy data. In Educational Data Mining, pages 217–222, 2011.
- Investigating active learning for concept prerequisite learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Prerequisite relation learning for concepts in moocs. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1447–1456, 2017.
- Skill-it! a data-driven skills framework for understanding and training language models, 2023.
- Skill-mix: a flexible and expandable family of evaluations for ai models, 2023.
- A theory for emergence of complex skills in language models, 2023.
- Training verifiers to solve math word problems, 2021.
- Mixtral of experts, 2024.
- Self-consistency improves chain of thought reasoning in language models, 2023.
- Complexity-based prompting for multi-step reasoning, 2023.
- Latent skill discovery for chain-of-thought reasoning, 2023.
- A diverse corpus for evaluating and developing english math word problem solvers, 2021.
- MAWPS: A math word problem repository. In Kevin Knight, Ani Nenkova, and Owen Rambow, editors, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1152–1157, San Diego, California, June 2016. Association for Computational Linguistics.
- Cumulative reasoning with large language models, 2023.
- Automatic model selection with large language models for reasoning. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 758–783, Singapore, December 2023. Association for Computational Linguistics.
- Progressive-hint prompting improves reasoning in large language models, 2023.