GenDec: A robust generative Question-decomposition method for Multi-hop reasoning (2402.11166v1)

Published 17 Feb 2024 in cs.CL

Abstract: Multi-hop QA (MHQA) involves step-by-step reasoning to answer complex questions and find multiple relevant supporting facts. However, Existing LLMs'(LLMs) reasoning ability in multi-hop question answering remains exploration, which is inadequate in answering multi-hop questions. Moreover, it is unclear whether LLMs follow a desired reasoning chain to reach the right final answer. In this paper, we propose a \textbf{gen}erative question \textbf{dec}omposition method (GenDec) from the perspective of explainable QA by generating independent and complete sub-questions based on incorporating additional extracted evidence for enhancing LLMs' reasoning ability in RAG. To demonstrate the impact, generalization, and robustness of Gendec, we conduct two experiments, the first is combining GenDec with small QA systems on paragraph retrieval and QA tasks. We secondly examine the reasoning capabilities of various state-of-the-art LLMs including GPT-4 and GPT-3.5 combined with GenDec. We experiment on the HotpotQA, 2WikihopMultiHopQA, MuSiQue, and PokeMQA datasets.

Citations (2)

View on Semantic Scholar

Summary

The paper introduces GenDec, a generative question decomposition method that improves multi-hop QA by breaking down complex questions into sub-questions.
It demonstrates significant gains in answer accuracy and paragraph retrieval when integrated with both fine-tuned models and LLMs like GPT-4.
Extensive evaluations on benchmarks such as HotpotQA and 2WikiHopQA highlight GenDec's potential for advancing robust multi-hop reasoning.

GenDec: Enhancing Multi-hop Question Answering through Generative Question Decomposition

Introduction

In the field of natural language processing, the task of Multi-hop Question Answering (MHQA) stands out due to its complex requirement of iteratively combining information from various sources to arrive at a final answer. Despite advancements in LLMs and Retrieval-Augmented Generation (RAG), these systems often hit a ceiling in MHQA due to the intricate reasoning chains needed. In addressing this challenge, we introduce GenDec, a novel approach that utilizes a generative model for question decomposition (QD), aiming to simplify the MHQA process into manageable sub-tasks while improving reasoning ability.

Generative Question Decomposition (GenDec)

The cornerstone of GenDec lies in its ability to generate independent, complete sub-questions from a given multi-hop question, enabling a structured decomposition that facilitates easier and more accurate answering of complex questions. By incorporating additional extracted evidence into the generation process, GenDec improves the reasoning capabilities of LLMs within a RAG framework. This generative approach not only aids in accurately pinpointing relevant information but also in reducing the chances of error propagation—a common hurdle in sequential answering of decomposed sub-questions.

Experimentation and Evaluation

Experimental Design

To validate GenDec's effectiveness, we conducted comprehensive experiments across a suite of MHQA datasets including HotpotQA, 2WikiHopQA, MuSiQue, and PokeMQA. Our evaluation benchmarked GenDec against state-of-the-art (SOTA) models in paragraph retrieval and QA, comparing its performance in enhancing the reasoning and answering abilities of both fine-tuned QA models and LLMs like GPT-4 and GPT-3.5.

Findings

The results from our extensive evaluation underscore GenDec's impact in significantly improving QA performance across all tested models and datasets. When integrated with fine-tuned models, GenDec demonstrates a remarkable improvement in both answer accuracy and the quality of paragraph retrieval. Moreover, the application of GenDec with LLMs revealed a notable enhancement in their reasoning capabilities, underscoring the importance of structured question decomposition in MHQA tasks.

Theoretical and Practical Implications

GenDec's introduction marks a significant step in the advancement of MHQA systems. Theoretically, it pushes the envelope in understanding the mechanics of effective question decomposition and its role in multi-source information integration. Practically, GenDec sets new benchmarks in MHQA performance, presenting a robust model that can be further refined and adapted for a broader range of applications beyond the datasets evaluated. Its success opens avenues for deeper explorations into generative approaches for question answering, particularly in leveraging LLMs for complex reasoning tasks.

Future Directions and Limitations

Although GenDec represents a leap forward in MHQA, it is not without its limitations. The reliance on the quality of retrieved paragraphs for effective question decomposition remains a challenge, potentially affecting the model's performance in scenarios with poor information retrieval. Future work will need to explore more sophisticated retrieval mechanisms and investigate the integration of knowledge bases to further enhance GenDec's applicability and robustness. Additionally, expanding GenDec's testing across more diverse datasets and languages would provide a more comprehensive understanding of its capabilities and limitations.

Conclusion

GenDec embodies a significant advancement in MHQA by introducing a generative approach to question decomposition that enhances both the reasoning capabilities of LLMs and the overall performance of QA models. Its effectiveness, demonstrated across several experiments, highlights the potential of generative models in tackling complex multi-hop reasoning tasks. As we look to the future, GenDec's foundational work sets the stage for further innovations in the field of question answering, promising more sophisticated and capable MHQA systems.

PDF Markdown