Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 161 tok/s

Gemini 2.5 Pro 40 tok/s Pro

GPT-5 Medium 26 tok/s Pro

GPT-5 High 27 tok/s Pro

GPT-4o 117 tok/s Pro

Kimi K2 149 tok/s Pro

GPT OSS 120B 440 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Improving Faithfulness of Large Language Models in Summarization via Sliding Generation and Self-Consistency (2407.21443v1)

Published 31 Jul 2024 in cs.CL and cs.AI

Abstract: Despite LLMs have demonstrated impressive performance in various tasks, they are still suffering from the factual inconsistency problem called hallucinations. For instance, LLMs occasionally generate content that diverges from source article, and prefer to extract information that appears at the beginning and end of the context, especially in long document summarization. Inspired by these findings, we propose to improve the faithfulness of LLMs in summarization by impelling them to process the entire article more fairly and faithfully. We present a novel summary generation strategy, namely SliSum, which exploits the ideas of sliding windows and self-consistency. Specifically, SliSum divides the source article into overlapping windows, and utilizes LLM to generate local summaries for the content in the windows. Finally, SliSum aggregates all local summaries using clustering and majority voting algorithm to produce more faithful summary of entire article. Extensive experiments demonstrate that SliSum significantly improves the faithfulness of diverse LLMs including LLaMA-2, Claude-2 and GPT-3.5 in both short and long text summarization, while maintaining their fluency and informativeness and without additional fine-tuning and resources. We further conduct qualitative and quantitative studies to investigate why SliSum works and impacts of hyperparameters in SliSum on performance.

References (73)

Citations (4)

View on Semantic Scholar

Summary

The paper presents the SliSum approach, which employs a sliding window and self-consistency to generate more faithful summaries.
It divides documents into overlapping windows, filters contradictory content using lexical clustering, and aggregates summaries via majority vote.
Experiments across datasets show improved factual accuracy and computational efficiency compared to traditional LLM summarization methods.

Improving Faithfulness of LLMs in Summarization via Sliding Generation and Self-Consistency

Introduction

The paper focuses on enhancing the faithfulness of LLMs in summarization tasks, addressing a prevailing issue known as hallucinations where generated text may deviate from the source content. The key innovation presented is the SliSum approach, which employs a sliding window technique combined with self-consistency methods to generate more faithful summaries. The method involves dividing the source document into overlapping windows, generating local summaries, filtering contradictory statements, and employing a majority voting system to ensure consistency.

Methodology

SliSum Architecture

The SliSum framework consists of three main components:

Sliding Generation:
- Articles are divided into overlapping windows.
- Each window is summarized independently using an LLM.
- This generates local summaries that can vary in fidelity.
  Figure 1: The pipeline and example of our proposed SliSum approach.
Filtration:
- Utilizing lexical clustering to filter out irrelevant or inaccurate local summary content.
- Minimizing noise by retaining only frequently mentioned statements, promoting self-consistency.
Aggregation:
- Applying contradiction detection to identify semantically distinct statements about the same topic.
- Employing a majority vote system to choose the most consistent statement, thus enhancing summary faithfulness.

Experiments and Results

The SliSum method was tested across several datasets, including CNN/DM, XSum, arXiv, and PubMed. These evaluations demonstrated that SliSum statistically improves the faithfulness of summaries without compromising fluency or informativeness. Notably, the use of overlapping windows effectively reduced the impact of the LLMs' positional biases.

Figure 2: The performance of GPT-3.5 evaluated on samples of different length.

Hyperparameter Analysis

The impact of hyperparameters within SliSum was thoroughly analyzed, including window size and the ratio of window size to step size. Results indicated that optimizing the ratio leads to improved factual consistency, while excessively long windows can detract from the summary quality.

Figure 3: Impact of ratio $L_w / L_s$ (left) and window size (right) on faithfulness of GPT-3.5.

Complexity and Implementation Considerations

The theoretical analysis and empirical tests show that SliSum scales linearly with document length, offering computational efficiency compared to standard LLM summarization tasks which are quadratic in complexity. Furthermore, the method’s ability to parallelize certain operations provides additional performance gains.

Conclusion

The SliSum approach offers a practical and effective solution to improve the faithfulness of LLM-generated summaries. By leveraging sliding window techniques in conjunction with self-consistency mechanisms, SliSum enhances both short and long text summarization tasks. The reduction in hallucination without additional computational overhead makes it suitable for integration into existing LLM frameworks. Future research can explore extending these techniques to real-time applications and other text generation challenges.