Training Language Models to Generate Text with Citations via Fine-grained Rewards (2402.04315v3)

Published 6 Feb 2024 in cs.CL

Abstract: While recent LLMs have proven useful in answering user queries, they are prone to hallucination, and their responses often lack credibility due to missing references to reliable sources. An intuitive solution to these issues would be to include in-text citations referring to external documents as evidence. While previous works have directly prompted LLMs to generate in-text citations, their performances are far from satisfactory, especially when it comes to smaller LLMs. In this work, we propose an effective training framework using fine-grained rewards to teach LLMs to generate highly supportive and relevant citations, while ensuring the correctness of their responses. We also conduct a systematic analysis of applying these fine-grained rewards to common LLM training strategies, demonstrating its advantage over conventional practices. We conduct extensive experiments on Question Answering (QA) datasets taken from the ALCE benchmark and validate the model's generalizability using EXPERTQA. On LLaMA-2-7B, the incorporation of fine-grained rewards achieves the best performance among the baselines, even surpassing that of GPT-3.5-turbo.

References (43)

Citations (17)

View on Semantic Scholar

Summary

The paper introduces a training framework using rejection sampling and reinforcement learning with fine-grained rewards to enhance citation accuracy and response correctness.
The framework achieves significant improvements on QA benchmarks by aligning sentence-level and citation-level evaluation metrics.
These results pave the way for more credible and verifiable LLM outputs, setting new standards in attributable text generation.

Enhancing Attributable Text Generation in LLMs with Fine-Grained Rewards

Introduction to Attributable Text Generation

LLMs have shown remarkable capabilities in generating human-like text responses. However, their tendency to produce unverifiable or incorrect information, commonly referred to as "hallucinations," has raised questions regarding their reliability and trustworthiness. As a solution to increase their credibility and utility, recent efforts have been directed towards enabling LLMs to generate text with in-text citations, thereby linking generated claims to credible sources. This process, known as attributable text generation, aims to make the text output more transparent and verifiable by the end-users.

Challenges and Proposed Approach

The generation of accurate and relevant citations represents a multifaceted challenge, involving the assessment of both citation quality and information correctness. Prior attempts to tackle this problem through direct prompts or retrieval-augmented generation (RAG) have shown limitations, particularly for smaller LLMs. In response, this research introduces a new training framework leveraging fine-grained rewards, designed to enhance the LLM’s capabilities in producing both accurate responses and high-quality citations. This approach employs two training algorithms, rejection sampling (RS) and reinforcement learning (RL), utilizing fine-grained automatic evaluation functions for reward signals at both sentence-level and citation-level.

Methodology

The developed training framework initializes the LLM using distillation from a powerful model like ChatGPT, providing a starting point before applying RS or RL training. The methodology focuses on fine-grained rewards across three primary dimensions: correctness of the generated response, citation recall, and citation precision. These rewards aim to ensure that each sentence in the response is not only accurate but also appropriately supported by relevant citations. Experimental analysis is conducted on the Question Answering (QA) datasets from the ALCE benchmark, with further validation on the EXPERTQA for generalizability.

Results and Implications

The implementation of fine-grained rewards for attributable generation significantly outperforms traditional holistic reward approaches, demonstrating substantial improvements across all evaluated metrics. Specifically, the combination of RS and RL, when trained with fine-grained rewards, yields the best performance, surpassing established models like ChatGPT in certain metrics. These findings underscore the potential of fine-grained reward systems in refining the accuracy and credibility of LLM outputs, particularly in the context of generating text with citations.

Future Directions

The research suggests several avenues for further exploration. Improving LLMs’ reading comprehension and their capability to synthesize information from multiple documents could enhance correctness recall in QA tasks. Additionally, the iterative application of imitation learning and reinforcement learning might push the limits of small LLMs in attributable text generation. Addressing the initial reliance on distillation from larger models like ChatGPT could pave the way for a more accessible and iterative training process, independent of proprietary LLMs.

Conclusion

The introduction of fine-grained rewards in the training of LLMs for attributable text generation marks a significant advancement in enhancing the models' ability to generate trustworthy and verifiable content. By focusing on citation quality and information correctness, this approach addresses a critical gap in existing LLM applications, promising a future where LLM-generated text can be both informative and credible.

PDF Markdown

Related Papers

Tweets

https://twitter.com/skeptrune/status/1758570659968631059

https://twitter.com/anmorgan2414/status/1886158044045709451

https://twitter.com/DavidMezzetti/status/1758554299263537577

https://twitter.com/thotsonrecord/status/1758728502608609680

https://twitter.com/winsontang/status/1758543926267609148

https://twitter.com/betterhn20/status/1758548555097080001