Graph Elicitation for Guiding Multi-Step Reasoning in Large Language Models (2311.09762v2)

Published 16 Nov 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Chain-of-Thought (CoT) prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities of LLMs. However, prompting the LLMs to directly generate sub-questions is suboptimal since they sometimes generate redundant or irrelevant questions. To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-questions and corresponding answers. Concretely, given an input question, we first prompt the LLM to generate knowledge triplets, forming a graph representation of the question. Unlike conventional knowledge triplets, our approach allows variables as head or tail entities, effectively representing a question as knowledge triplets. Second, for each triplet, the LLM generates a corresponding sub-question and answer along with using knowledge retrieval. If the prediction confidence exceeds a threshold, the sub-question and prediction are incorporated into the prompt for subsequent processing. This approach encourages that sub-questions are grounded in the extracted knowledge triplets, reducing redundancy and irrelevance. Our experiments demonstrate that our approach outperforms previous CoT prompting methods and their variants on multi-hop question answering benchmark datasets.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a graph-guided CoT method that decomposes multi-hop questions into subcomponents for improved reasoning.
It constructs question graphs and generates verified intermediate rationales that filter out irrelevant or hallucinated information.
Evaluations on 2WikiMultihopQA, MuSiQue, and Bamboogle show enhanced EM and F1 scores over traditional CoT approaches.

Graph-Guided Reasoning for Multi-Hop Question Answering in LLMs

The paper "Graph-Guided Reasoning for Multi-Hop Question Answering in LLMs," authored by Jinyoung Park, Ameen Patel, Omar Zia Khan, Hyunwoo J. Kim, and Joo-Kyung Kim, presents a methodical approach to enhancing the reasoning capabilities of LLMs in multi-hop question answering (QA). The authors identify and address the deficiencies of existing Chain-of-Thought (CoT) prompting approaches, which include generating irrelevant rationales and failing to compose necessary subquestions for retrieving pertinent information.

Introduction

LLMs have demonstrated significant proficiency in varied natural language processing tasks by scaling up the model size. Nonetheless, complex reasoning tasks, such as arithmetic, commonsense, and multi-hop QA, continue to pose challenges. Traditional CoT prompting methods have improved reasoning by generating intermediate rationales but still struggle with issues like irrelevant rationale generation and hallucination.

Motivation

The paper identifies two critical problems with existing CoT approaches:

Generation of rationales that are irrelevant to the posed question.
Inability to effectively compose or query subquestions to gather relevant information.

These limitations impede the model's ability to accurately reason through multiple steps required in multi-hop QA tasks.

Proposed Method

To mitigate these issues, the authors propose a graph-guided CoT prompting method. The key steps in their approach are:

Question Graph Construction: Using LLM prompting, a question graph is constructed by extracting triplets from the initial question. This graph represents relationships and serves as a foundation for guided reasoning.
Subquestion Generation: Based on the question graph, multiple subquestions are generated. These subquestions help in decomposing the original complex question into simpler, more manageable parts.
Rationale Generation: For each subquestion, the model generates intermediate rationales. This process ensures that each step of reasoning is backed by relevant information.
Rationale Verification: Generated rationales are compared against the question graph. If a rationale is deemed irrelevant, it is filtered out. Moreover, follow-up questions are posed to gather any missing relevant information.
Contextual CoT Paths: Conventional CoT paths are generated excluding the entities mentioned in the question graph to capture context information potentially missed during graph extraction.

Results and Evaluation

The authors evaluate their method on three multi-hop QA benchmark datasets: 2WikiMultihopQA, MuSiQue, and Bamboogle. They conduct experiments using Llama-2 models of varying sizes (13B and 70B). The proposed graph-guided reasoning approach consistently outperforms existing CoT prompting methods across all datasets and model sizes.

Numerical Performance

For 2WikiMultihopQA, the graph-guided reasoning method achieves 39.2% EM (Exact Match) and 46.87% F1 score, compared to 37.6% EM and 44.04% F1 for the best baseline (Self-Consistency) using Llama-2-70B.
In the open-book setting, the proposed method scores an impressive 54.2% EM and 63.97% F1 on 2WikiMultihopQA.

Implications

The introduction of graph-guided CoT prompting addresses key limitations of traditional methods, notably through its structured approach to generating and verifying rationales. The implications of this research are significant for both practical applications and theoretical advancements in AI:

Practical: Enhanced performance in multi-hop QA tasks can improve AI applications requiring complex decision-making and reasoning, such as customer service automation and advanced tutoring systems.
Theoretical: The integration of graph structures in CoT prompting paves the way for more sophisticated hybrid models combining symbolic reasoning with deep learning.

Future Directions

Future work could explore further refinement of graph extraction techniques and better integration with retrieval-augmented generation methods. Additionally, expanding the approach to other types of questions and reasoning tasks could demonstrate the broader applicability of the method.

In summary, this paper introduces a systematic and effective approach to enhancing LLMs' reasoning capabilities in multi-hop QA tasks by leveraging graph-based knowledge representation and verification, setting a new benchmark for future research in this domain.

PDF Markdown