Emergent Mind

Abstract

While recent LLMs have proven useful in answering user queries, they are prone to hallucination, and their responses often lack credibility due to missing references to reliable sources. An intuitive solution to these issues would be to include in-text citations referring to external documents as evidence. While previous works have directly prompted LLMs to generate in-text citations, their performances are far from satisfactory, especially when it comes to smaller LLMs. In this work, we propose an effective training framework using fine-grained rewards to teach LLMs to generate highly supportive and relevant citations, while ensuring the correctness of their responses. We also conduct a systematic analysis of applying these fine-grained rewards to common LLM training strategies, demonstrating its advantage over conventional practices. We conduct extensive experiments on Question Answering (QA) datasets taken from the ALCE benchmark and validate the model's generalizability using EXPERTQA. On LLaMA-2-7B, the incorporation of fine-grained rewards achieves the best performance among the baselines, even surpassing that of GPT-3.5-turbo.

A visual representation of the reward function used in the discussed research paper.

Overview

  • This paper focuses on enhancing the credibility and utility of LLMs by enabling them to generate text with accurate and relevant in-text citations, a practice known as attributable text generation.

  • It introduces a new training framework that leverages fine-grained rewards, employing rejection sampling and reinforcement learning to improve the LLM's capability to produce accurate responses and high-quality citations.

  • The methodology emphasizes ensuring the generated text's credibility through fine-grained rewards for correctness, citation recall, and citation precision, tested on QA datasets.

  • Results show that fine-grained rewards significantly outperform traditional holistic reward approaches, highlighting the potential to improve LLM outputs in terms of accuracy and credibility, particularly in generating text with citations.

Enhancing Attributable Text Generation in LLMs with Fine-Grained Rewards

Introduction to Attributable Text Generation

LLMs have shown remarkable capabilities in generating human-like text responses. However, their tendency to produce unverifiable or incorrect information, commonly referred to as "hallucinations," has raised questions regarding their reliability and trustworthiness. As a solution to increase their credibility and utility, recent efforts have been directed towards enabling LLMs to generate text with in-text citations, thereby linking generated claims to credible sources. This process, known as attributable text generation, aims to make the text output more transparent and verifiable by the end-users.

Challenges and Proposed Approach

The generation of accurate and relevant citations represents a multifaceted challenge, involving the assessment of both citation quality and information correctness. Prior attempts to tackle this problem through direct prompts or retrieval-augmented generation (RAG) have shown limitations, particularly for smaller LLMs. In response, this research introduces a new training framework leveraging fine-grained rewards, designed to enhance the LLM’s capabilities in producing both accurate responses and high-quality citations. This approach employs two training algorithms, rejection sampling (RS) and reinforcement learning (RL), utilizing fine-grained automatic evaluation functions for reward signals at both sentence-level and citation-level.

Methodology

The developed training framework initializes the LLM using distillation from a powerful model like ChatGPT, providing a starting point before applying RS or RL training. The methodology focuses on fine-grained rewards across three primary dimensions: correctness of the generated response, citation recall, and citation precision. These rewards aim to ensure that each sentence in the response is not only accurate but also appropriately supported by relevant citations. Experimental analysis is conducted on the Question Answering (QA) datasets from the ALCE benchmark, with further validation on the EXPERTQA for generalizability.

Results and Implications

The implementation of fine-grained rewards for attributable generation significantly outperforms traditional holistic reward approaches, demonstrating substantial improvements across all evaluated metrics. Specifically, the combination of RS and RL, when trained with fine-grained rewards, yields the best performance, surpassing established models like ChatGPT in certain metrics. These findings underscore the potential of fine-grained reward systems in refining the accuracy and credibility of LLM outputs, particularly in the context of generating text with citations.

Future Directions

The research suggests several avenues for further exploration. Improving LLMs’ reading comprehension and their capability to synthesize information from multiple documents could enhance correctness recall in QA tasks. Additionally, the iterative application of imitation learning and reinforcement learning might push the limits of small LLMs in attributable text generation. Addressing the initial reliance on distillation from larger models like ChatGPT could pave the way for a more accessible and iterative training process, independent of proprietary LLMs.

Conclusion

The introduction of fine-grained rewards in the training of LLMs for attributable text generation marks a significant advancement in enhancing the models' ability to generate trustworthy and verifiable content. By focusing on citation quality and information correctness, this approach addresses a critical gap in existing LLM applications, promising a future where LLM-generated text can be both informative and credible.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube