Emergent Mind

Diversity of Thought Improves Reasoning Abilities of LLMs

(2310.07088)

Published Oct 11, 2023 in cs.CL and cs.AI

Abstract

LLMs are documented to struggle in settings that require complex reasoning. Nevertheless, instructing the model to break down the problem into smaller reasoning steps, or ensembling various generations through modifying decoding steps boosts performance. However, these methods assume that the input prompt is fixed and expect the decoding strategies to introduce the diversity needed for ensembling. In this work, we discuss how one can create and leverage variations of the input prompt as a means of diversity of thought. We propose a method that automatically improves prompt diversity by soliciting feedback from the LLM to ideate approaches that are apt for the problem. We then ensemble the diverse prompts in our method DIVSE (DIVerse reasoning path Self-Ensemble) across multiple inference calls, or use diverse approaches within a single inference call; we call the latter IDIV-SE (In-call DIVerse reasoning path Self-Ensemble). Apart from our approaches outperforming prior work, DIV-SE(in particular) advances state-of-the-art performance on the challenging planning and graph coloring benchmarks. Our results improve the Pareto frontier of the accuracy-cost trade-off.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

References

PaLM 2 Technical Report
Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pp. 1877–1901. Curran Associates Inc., 2020. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks
Scaling Instruction-Finetuned Language Models
Training verifiers to solve math word problems
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
8-bit optimizers via block-wise quantization. 9th International Conference on Learning Representations, ICLR, 2022b.
Automatically Auditing Large Language Models via Discrete Optimization
Large language models are zero-shot reasoners. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho (eds.), Advances in Neural Information Processing Systems
The Power of Scale for Parameter-Efficient Prompt Tuning
Holistic Evaluation of Language Models
Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 158–167, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/v1/P17-1015. https://aclanthology.org/P17-1015.
Show Your Work: Scratchpads for Intermediate Computation with Language Models
OpenAI. Introducing chatgpt. 2022. https://openai.com/blog/chatgpt/.
GPT-4 Technical Report
OpenAI. Gpt-4 technical report, 2023b.
Automatic Prompt Optimization with "Gradient Descent" and Beam Search
In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Retrieval Augmentation Reduces Hallucination in Conversation
CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4149–4158, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1421. https://aclanthology.org/N19-1421.
Llama 2: Open Foundation and Fine-Tuned Chat Models
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
On the planning abilities of large language models – a critical investigation
Self-consistency improves chain of thought reasoning in language models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Answering Questions by Meta-Reasoning over Multiple Chains of Thought
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models