LLMs learn a vast amount of knowledge during pretraining, but they are often oblivious to the source(s) of such knowledge. We investigate the problem of intrinsic source citation, where LLMs are required to cite the pretraining source supporting a generated response. Intrinsic source citation can enhance LLM transparency, interpretability, and verifiability. To give LLMs such ability, we explore source-aware training -- a post pretraining recipe that involves (i) training the LLM to associate unique source document identifiers with the knowledge in each document, followed by (ii) an instruction-tuning to teach the LLM to cite a supporting pretraining source when prompted. Source-aware training can easily be applied to pretrained LLMs off the shelf, and diverges minimally from existing pretraining/fine-tuning frameworks. Through experiments on carefully curated data, we demonstrate that our training recipe can enable faithful attribution to the pretraining data without a substantial impact on the model's quality compared to standard pretraining. Our results also highlight the importance of data augmentation in achieving attribution. Code and data available here: \url{https://github.com/mukhal/intrinsic-source-citation}
This paper introduces a method called source-aware training, designed to enable language models to cite the original documents from which their responses are derived, increasing transparency, interpretability, and verifiability.
Source-aware training consists of two phases: training language models to associate responses with the source document identifiers, and instruction-tuning to prompt models to cite these sources accurately.
Experimental results show that the strategy of document identifier injection and data augmentation significantly affects the model's citation abilities, maintaining quality performance on tasks like intrinsic citation and external datasets evaluation.
The approach has broad practical implications for developing trustworthy AI systems and sets the stage for future research on enhancing models' citation capabilities and investigating the balance between performance, citation accuracy, and training efficiency.
In the terrain of language model research, the capacity for models to accurately attribute their generated responses to specific pretraining sources is gaining critical importance. This work explore a methodological advancement known as source-aware training, aimed at enabling language models to inherently cite the original documents their responses are derived from. Such capability enriches models with greater transparency, interpretability, and verifiability—a trinity crucial for leveraging language models across a spectrum of applications demanding reliability and accountability.
The inherent challenge tackled in this study revolves around intrinsic source citation, where a model must identify and cite the pretraining source(s) underpinning its generated response. Distinct from retrieval-based or post-hoc citation methods, intrinsic source citation embeds the citation capability within the model, potentially leading to more accurate and faithful attribution. The source-aware training approach developed herein consists of two phases: (i) training language models to associate responses with unique identifiers of source documents, and (ii) instruction-tuning that teaches models to cite relevant pretraining sources upon request. The elegance of this procedure lies in its compatibility with existing pretrained models and its minimal departures from established pretraining and fine-tuning frameworks.
The source-aware training methodology encompasses:
Notably, the method involves experimenting with various strategies for document identifier injection, such as placing identifiers at the beginning, within, or at the end of documents. It crucially highlights the necessity of data augmentation for enhancing model performance in source attribution tasks.
The study reveals several key findings:
Practical implications of this research are substantial, carrying weight for the development of trustworthy AI systems. By endowing models with the capability to cite their knowledge sources, developers and users can gain insights into the provenance of model outputs, enhancing trust and enabling easier verification. The methodology also opens doors to refined model interpretability and paves the way for addressing concerns related to data privacy, copyright infringements, and information veracity within model training corpora.
Looking ahead, future research avenues could explore scaling source-aware training to larger datasets, refining document identifier strategies for more complex information types, and extending citation capabilities to encompass a wider array of knowledge forms beyond factual content. Additionally, investigating the balance between model performance, source citation accuracy, and training efficiency remains a fertile ground for further inquiry.
In summary, this work sets a foundational stone in the quest for building language models cognizant of their knowledge origins, marking a step forward in the journey towards AI systems that are not only powerful but also principled and accountable.