Emergent Mind

Abstract

Recent efforts to address hallucinations in LLMs have focused on attributed text generation, which supplements generated texts with citations of supporting sources for post-generation fact-checking and corrections. Yet, these citations often point to entire documents or paragraphs, burdening users with extensive verification work. In this paper, we introduce a locally-attributable text generation approach, prioritizing concise attributions. Our method, named Attribute First, then Generate'', breaks down the conventional end-to-end generation process into three intuitive steps: content selection, sentence planning, and sequential sentence generation. By initially identifying relevant source segments (select first'') and then conditioning the generation process on them (then generate''), we ensure these segments also act as the output's fine-grained attributions (select'' becomes ``attribute''). Tested on Multi-document Summarization and Long-form Question-answering, our method not only yields more concise citations than the baselines but also maintains - and in some cases enhances - both generation quality and attribution accuracy. Furthermore, it significantly reduces the time required for fact verification by human assessors.

Model outputs fluent texts aligned with inputs, including detailed attributions to specific highlighted text spans.

Overview

  • The paper introduces a novel approach named 'Attribute First, then Generate' for Grounded Text Generation, prioritizing reliability and factuality by structuring the text generation process into content selection, sentence planning, and sequential sentence generation.

  • It reformulates the task to focus on fine-grained, sentence-level attributions, allowing for more precise fact verification and reducing human effort in identifying relevant source segments.

  • Two implementation strategies, in-context learning and fine-tuning, were explored to adapt the model to the unique requirements of this approach, focusing on aspects like salience for summarization or query relevance.

  • Experimental evaluations on Multi-document Summarization (MDS) and Long-form Question-answering (LFQA) showed that this method not only maintains or improves generation quality but also significantly enhances attribution accuracy, making fact-checking more efficient.

Locally-attributable Grounded Text Generation through a Structured Multi-step Approach

Introduction

Recent advances in Grounded Text Generation have highlighted the necessity of increasing the reliability and factuality of generated texts. The new approach, named "Attribute First, then Generate," reshapes the conventional process of text generation by incorporating a multi-step strategy focusing on content selection, sentence planning, and sequential sentence generation. This methodology ensures concise attributions by leveraging specific source segments, thereby significantly enhancing the efficiency of fact verification by human assessors.

Task Reformulation and Method Overview

The task of Locally-attributable Grounded Text Generation is reformulated to prioritize fine-grained, sentence-level attributions, ensuring each generated fact is supported by specific text snippets from the source documents. This granular approach of attributing minimizes user's effort in fact verification by concentrating on the most pertinent text snippets, ranging from specific sentences to sub-sentence spans.

The "Attribute First, then Generate" scheme proposes a structured, intuitive approach to text generation by separating the process into distinct steps - starting with content selection to identify relevant source segments, followed by sentence planning to organize these segments into coherent structures, and concluding with sentence-by-sentence generation, ensuring the generation is closely guided by initially selected attributions.

Implementation Strategies

To apply the proposed framework, two strategies were explored: in-context learning and fine-tuning. The in-context learning strategy employs prompt-based techniques to guide the model through each step of the proposed scheme, adapting to task-specific needs such as salience for summarization or query relevance for question-answering. On the other hand, the fine-tuning strategy specifically tailors model components towards the nuanced requirements of the "Attribute First, then Generate" approach, adjusting content selection and generation accordingly.

Experimental Evaluation

The methodology was evaluated on Multi-document Summarization (MDS) and Long-form Question-answering (LFQA), demonstrating its ability to not only maintain but occasionally improve the generation quality while achieving high attribution accuracy. Moreover, the produced attributions were significantly more concise compared to baseline approaches, evidentially reducing the manual effort required for fact-checking by over 50%.

Implications and Future Directions

The introduction of the "Attribute First, then Generate" framework signifies a pivotal shift towards enhancing the fidelity and utility of generated text by focusing on the granularity of attributions. Its success in producing concisely cited, high-quality text opens avenues for further research and development in locally-attributed text generation, encouraging the exploration of various grounded generation tasks under this new paradigm.

Overall, this structured approach not only addresses the challenge of producing factually accurate and verifiable texts but also significantly streamlines the fact-checking process, marking a significant advancement in the development of trustworthy AI-generated content. Future developments could extend this paradigm, further refining attribution precision and exploring its applicability across a broader spectrum of text generation tasks.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.