Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 159 tok/s Pro
GPT OSS 120B 431 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Over-Reasoning and Redundant Calculation of Large Language Models (2401.11467v2)

Published 21 Jan 2024 in cs.CL

Abstract: LLMs can solve problems step-by-step. While this chain-of-thought (CoT) reasoning boosts LLMs' performance, it is unclear if LLMs \textit{know} when to use CoT and whether those CoT are always necessary to answer the question. This paper shows that LLMs tend to generate redundant calculations and reasoning on a manually constructed math QA dataset, GSM8K-Zero. GSM8K-Zero is constructed such that the questions can be answered without any calculations, but LLMs, including Llama-2 models and Claude-2, tend to generate lengthy and unnecessary calculations to answer the questions. We also conduct experiments to explain why LLMs generate redundant calculations and reasonings. GSM8K-Zero is publicly available at https://github.com/d223302/Over-Reasoning-of-LLMs and https://huggingface.co/datasets/dcml0714/GSM8K-Zero.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo. https://github.com/nomic-ai/gpt4all.
  2. Palm 2 technical report.
  3. Anthropic. 2023. Model card and evaluations for claude models. Accessed on October 1, 2023.
  4. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  5. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  6. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168.
  7. ROSCOE: A suite of metrics for scoring step-by-step reasoning. In The Eleventh International Conference on Learning Representations.
  8. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arxiv:2301.07597.
  9. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems.
  10. The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688.
  11. OpenAI. 2022. Chatgpt: Optimizing language models for dialogue. Accessed on October 10, 2023.
  12. OpenAI. 2023. Gpt-4 technical report.
  13. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems.
  14. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  15. Llama 2: Open foundation and fine-tuned chat models.
  16. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2609–2634, Toronto, Canada. Association for Computational Linguistics.
  17. Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
  18. The unreliability of explanations in few-shot prompting for textual reasoning. In Advances in Neural Information Processing Systems.
  19. Judging llm-as-a-judge with mt-bench and chatbot arena.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 20 likes.

Upgrade to Pro to view all of the tweets about this paper: