Emergent Mind

Abstract

Multi-hop QA (MHQA) involves step-by-step reasoning to answer complex questions and find multiple relevant supporting facts. However, Existing LLMs'(LLMs) reasoning ability in multi-hop question answering remains exploration, which is inadequate in answering multi-hop questions. Moreover, it is unclear whether LLMs follow a desired reasoning chain to reach the right final answer. In this paper, we propose a \textbf{gen}erative question \textbf{dec}omposition method (GenDec) from the perspective of explainable QA by generating independent and complete sub-questions based on incorporating additional extracted evidence for enhancing LLMs' reasoning ability in RAG. To demonstrate the impact, generalization, and robustness of Gendec, we conduct two experiments, the first is combining GenDec with small QA systems on paragraph retrieval and QA tasks. We secondly examine the reasoning capabilities of various state-of-the-art LLMs including GPT-4 and GPT-3.5 combined with GenDec. We experiment on the HotpotQA, 2WikihopMultiHopQA, MuSiQue, and PokeMQA datasets.

Overview

  • GenDec introduces a generative model for question decomposition to simplify Multi-hop Question Answering (MHQA) into manageable sub-tasks, enhancing reasoning ability.

  • By generating independent sub-questions and integrating additional evidence, GenDec improves LLMs' accuracy and reduces error propagation in the MHQA process.

  • Comprehensive experiments on MHQA datasets demonstrated GenDec's significant improvement in QA performance and reasoning capabilities over state-of-the-art models.

  • Despite its advancements, future work for GenDec includes addressing limitations related to information retrieval quality and expanding its applicability to more diverse datasets and languages.

GenDec: Enhancing Multi-hop Question Answering through Generative Question Decomposition

Introduction

In the realm of natural language processing, the task of Multi-hop Question Answering (MHQA) stands out due to its complex requirement of iteratively combining information from various sources to arrive at a final answer. Despite advancements in LLMs and Retrieval-Augmented Generation (RAG), these systems often hit a ceiling in MHQA due to the intricate reasoning chains needed. In addressing this challenge, we introduce GenDec, a novel approach that utilizes a generative model for question decomposition (QD), aiming to simplify the MHQA process into manageable sub-tasks while improving reasoning ability.

Generative Question Decomposition (GenDec)

The cornerstone of GenDec lies in its ability to generate independent, complete sub-questions from a given multi-hop question, enabling a structured decomposition that facilitates easier and more accurate answering of complex questions. By incorporating additional extracted evidence into the generation process, GenDec improves the reasoning capabilities of LLMs within a RAG framework. This generative approach not only aids in accurately pinpointing relevant information but also in reducing the chances of error propagation—a common hurdle in sequential answering of decomposed sub-questions.

Experimentation and Evaluation

Experimental Design

To validate GenDec's effectiveness, we conducted comprehensive experiments across a suite of MHQA datasets including HotpotQA, 2WikiHopQA, MuSiQue, and PokeMQA. Our evaluation benchmarked GenDec against state-of-the-art (SOTA) models in paragraph retrieval and QA, comparing its performance in enhancing the reasoning and answering abilities of both fine-tuned QA models and LLMs like GPT-4 and GPT-3.5.

Findings

The results from our extensive evaluation underscore GenDec's impact in significantly improving QA performance across all tested models and datasets. When integrated with fine-tuned models, GenDec demonstrates a remarkable improvement in both answer accuracy and the quality of paragraph retrieval. Moreover, the application of GenDec with LLMs revealed a notable enhancement in their reasoning capabilities, underscoring the importance of structured question decomposition in MHQA tasks.

Theoretical and Practical Implications

GenDec's introduction marks a significant step in the advancement of MHQA systems. Theoretically, it pushes the envelope in understanding the mechanics of effective question decomposition and its role in multi-source information integration. Practically, GenDec sets new benchmarks in MHQA performance, presenting a robust model that can be further refined and adapted for a broader range of applications beyond the datasets evaluated. Its success opens avenues for deeper explorations into generative approaches for question answering, particularly in leveraging LLMs for complex reasoning tasks.

Future Directions and Limitations

Although GenDec represents a leap forward in MHQA, it is not without its limitations. The reliance on the quality of retrieved paragraphs for effective question decomposition remains a challenge, potentially affecting the model's performance in scenarios with poor information retrieval. Future work will need to explore more sophisticated retrieval mechanisms and investigate the integration of knowledge bases to further enhance GenDec's applicability and robustness. Additionally, expanding GenDec's testing across more diverse datasets and languages would provide a more comprehensive understanding of its capabilities and limitations.

Conclusion

GenDec embodies a significant advancement in MHQA by introducing a generative approach to question decomposition that enhances both the reasoning capabilities of LLMs and the overall performance of QA models. Its effectiveness, demonstrated across several experiments, highlights the potential of generative models in tackling complex multi-hop reasoning tasks. As we look to the future, GenDec's foundational work sets the stage for further innovations in the field of question answering, promising more sophisticated and capable MHQA systems.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.