Attribute First, then Generate: Locally-attributable Grounded Text Generation (2403.17104v3)

Published 25 Mar 2024 in cs.CL

Abstract: Recent efforts to address hallucinations in LLMs have focused on attributed text generation, which supplements generated texts with citations of supporting sources for post-generation fact-checking and corrections. Yet, these citations often point to entire documents or paragraphs, burdening users with extensive verification work. In this paper, we introduce a locally-attributable text generation approach, prioritizing concise attributions. Our method, named "Attribute First, then Generate", breaks down the conventional end-to-end generation process into three intuitive steps: content selection, sentence planning, and sequential sentence generation. By initially identifying relevant source segments ("select first") and then conditioning the generation process on them ("then generate"), we ensure these segments also act as the output's fine-grained attributions ("select" becomes "attribute"). Tested on Multi-document Summarization and Long-form Question-answering, our method not only yields more concise citations than the baselines but also maintains - and in some cases enhances - both generation quality and attribution accuracy. Furthermore, it significantly reduces the time required for fact verification by human assessors.

Citations (12)

View on Semantic Scholar

Summary

The paper introduces a multi-step framework that prioritizes fine-grained, sentence-level attributions to enhance factuality in generated text.
It separates content selection, sentence planning, and sequential generation to ensure concise, verifiable attributions from source documents.
Experimental results on multi-document summarization and long-form QA demonstrate over 50% reduction in manual fact-checking effort.

Locally-attributable Grounded Text Generation through a Structured Multi-step Approach

Introduction

Recent advances in Grounded Text Generation have highlighted the necessity of increasing the reliability and factuality of generated texts. The new approach, named "Attribute First, then Generate," reshapes the conventional process of text generation by incorporating a multi-step strategy focusing on content selection, sentence planning, and sequential sentence generation. This methodology ensures concise attributions by leveraging specific source segments, thereby significantly enhancing the efficiency of fact verification by human assessors.

Task Reformulation and Method Overview

The task of Locally-attributable Grounded Text Generation is reformulated to prioritize fine-grained, sentence-level attributions, ensuring each generated fact is supported by specific text snippets from the source documents. This granular approach of attributing minimizes user's effort in fact verification by concentrating on the most pertinent text snippets, ranging from specific sentences to sub-sentence spans.

The "Attribute First, then Generate" scheme proposes a structured, intuitive approach to text generation by separating the process into distinct steps - starting with content selection to identify relevant source segments, followed by sentence planning to organize these segments into coherent structures, and concluding with sentence-by-sentence generation, ensuring the generation is closely guided by initially selected attributions.

Implementation Strategies

To apply the proposed framework, two strategies were explored: in-context learning and fine-tuning. The in-context learning strategy employs prompt-based techniques to guide the model through each step of the proposed scheme, adapting to task-specific needs such as salience for summarization or query relevance for question-answering. On the other hand, the fine-tuning strategy specifically tailors model components towards the nuanced requirements of the "Attribute First, then Generate" approach, adjusting content selection and generation accordingly.

Experimental Evaluation

The methodology was evaluated on Multi-document Summarization (MDS) and Long-form Question-answering (LFQA), demonstrating its ability to not only maintain but occasionally improve the generation quality while achieving high attribution accuracy. Moreover, the produced attributions were significantly more concise compared to baseline approaches, evidentially reducing the manual effort required for fact-checking by over 50%.

Implications and Future Directions

The introduction of the "Attribute First, then Generate" framework signifies a pivotal shift towards enhancing the fidelity and utility of generated text by focusing on the granularity of attributions. Its success in producing concisely cited, high-quality text opens avenues for further research and development in locally-attributed text generation, encouraging the exploration of various grounded generation tasks under this new paradigm.

Overall, this structured approach not only addresses the challenge of producing factually accurate and verifiable texts but also significantly streamlines the fact-checking process, marking a significant advancement in the development of trustworthy AI-generated content. Future developments could extend this paradigm, further refining attribution precision and exploring its applicability across a broader spectrum of text generation tasks.

PDF Markdown

Related Papers

Tweets

https://twitter.com/TalSchuster/status/1774907754412376479

https://twitter.com/knishimae0531/status/1775010049863438338