Emergent Mind

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

(2304.13007)
Published Apr 25, 2023 in cs.CL and cs.AI

Abstract

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a voting mechanism over the final answers, but the intermediate steps themselves are discarded. While such approaches improve performance, they do not consider the relations between intermediate steps across chains and do not provide a unified explanation for the predicted answer. We introduce Multi-Chain Reasoning (MCR), an approach which prompts LLMs to meta-reason over multiple chains of thought, rather than aggregating their answers. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer. MCR outperforms strong baselines on 7 multi-hop QA datasets. Moreover, our analysis reveals that MCR explanations exhibit high quality, enabling humans to verify its answers.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Towards a human-like open-domain chatbot
  2. FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  4. Teaching large language models to self-debug
  5. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality
  6. PaLM: Scaling Language Modeling with Pathways
  7. Training Verifiers to Solve Math Word Problems
  8. Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning
  9. Towards a Human-like Open-Domain Chatbot
  10. Complexity-Based Prompting for Multi-Step Reasoning
  11. Rarr: Researching and revising what language models say, using language models
  12. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
  13. Rethinking with retrieval: Faithful large language model inference
  14. Jie Huang and Kevin Chen-Chuan Chang. 2022. Towards reasoning in large language models: A survey.
  15. Alon Jacovi and Yoav Goldberg. 2020. Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4198–4205, Online. Association for Computational Linguistics.
  16. Language Models (Mostly) Know What They Know
  17. How much coffee was consumed during EMNLP 2019? fermi problems: A new reasoning challenge for AI. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7318–7328, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  18. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  19. Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
  20. Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over BERT. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020, pages 39–48. ACM.
  21. Decomposed prompting: A modular approach for solving complex tasks
  22. Large language models are zero-shot reasoners. In ICML 2022 Workshop on Knowledge Retrieval and Language Models.
  23. Internet-augmented language models through few-shot prompting for open-domain question answering
  24. Making Large Language Models Better Reasoners with Step-Aware Verifier
  25. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  26. A survey of deep learning for mathematical reasoning
  27. Self-refine: Iterative refinement with self-feedback
  28. Show your work: Scratchpads for intermediate computation with language models
  29. ART: Automatic multi-step reasoning and tool-use for large language models
  30. Refiner: Reasoning feedback on intermediate representations
  31. Measuring and Narrowing the Compositionality Gap in Language Models
  32. Evaluating explanations: How much do explanations from the teacher aid students? Transactions of the Association for Computational Linguistics, 10:359–375
  33. Reasoning with language model prompting: A survey
  34. In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics.
  35. ColBERTv2: Effective and efficient retrieval via lightweight late interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3715–3734, Seattle, United States. Association for Computational Linguistics.
  36. Reflexion: Language agents with verbal reinforcement learning
  37. Recitation-augmented language models. ICLR.
  38. QuaRTz: An open-domain dataset of qualitative relationship questions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5941–5946, Hong Kong, China. Association for Computational Linguistics.
  39. Entailer: Answering questions with faithful and truthful chains of reasoning
  40. Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex questions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 641–651, New Orleans, Louisiana. Association for Computational Linguistics.
  41. Multimodalqa: complex question answering over text, tables and images. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  42. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.

  43. LLaMA: Open and Efficient Foundation Language Models
  44. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions
  45. MuSiQue: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539–554.
  46. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations.
  47. Chain of thought prompting elicits reasoning in large language models. NeurIPS.
  48. Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, 6:287–302.
  49. Break it down: A question understanding benchmark. Transactions of the Association for Computational Linguistics, 8:183–198.
  50. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, Brussels, Belgium. Association for Computational Linguistics.
  51. Tree of thoughts: Deliberate problem solving with large language models
  52. ReAct: Synergizing Reasoning and Acting in Language Models
  53. Making retrieval-augmented language models robust to irrelevant context
  54. STar: Bootstrapping reasoning with reasoning. In Advances in Neural Information Processing Systems.
  55. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Show All 55