Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias (2405.15739v3)

Published 24 May 2024 in cs.DL, cs.AI, cs.LG, and cs.SI

Abstract: Citation practices are crucial in shaping the structure of scientific knowledge, yet they are often influenced by contemporary norms and biases. The emergence of LLMs introduces a new dynamic to these practices. Interestingly, the characteristics and potential biases of references recommended by LLMs that entirely rely on their parametric knowledge, and not on search or retrieval-augmented generation, remain unexplored. Here, we analyze these characteristics in an experiment using a dataset from AAAI, NeurIPS, ICML, and ICLR, published after GPT-4's knowledge cut-off date. In our experiment, LLMs are tasked with suggesting scholarly references for the anonymized in-text citations within these papers. Our findings reveal a remarkable similarity between human and LLM citation patterns, but with a more pronounced high citation bias, which persists even after controlling for publication year, title length, number of authors, and venue. The results hold for both GPT-4, and the more capable models GPT-4o and Claude 3.5 where the papers are part of the training data. Additionally, we observe a large consistency between the characteristics of LLM's existing and non-existent generated references, indicating the model's internalization of citation patterns. By analyzing citation graphs, we show that the references recommended are embedded in the relevant citation context, suggesting an even deeper conceptual internalization of the citation networks. While LLMs can aid in citation generation, they may also amplify existing biases, such as the Matthew effect, and introduce new ones, potentially skewing scientific knowledge dissemination.

References (44)

Citations (1)

View on Semantic Scholar

Summary

The paper shows that large language models replicate human citation patterns, exhibiting a pronounced bias toward highly cited works.
The study employed multiple LLM versions and datasets from major conferences like AAAI, NeurIPS, ICML, and ICLR to quantify citation behaviors.
The findings highlight the potential to amplify the Matthew effect in academia, urging careful mitigation of inherent citation biases in LLM outputs.

Overview of the Influence of LLMs on Human Citation Patterns

The paper "LLMs Reflect Human Citation Patterns with a Heightened Citation Bias" provides an empirical examination of how LLMs, specifically several versions of GPT-4 and Claude 3.5, replicate and potentially exaggerate human citation patterns in academic settings. The authors conducted a detailed experiment using a dataset containing papers from prominent conferences such as AAAI, NeurIPS, ICML, and ICLR, focusing on assessing the citation behavior of LLMs in suggesting scholarly references.

Key Findings

The research reveals significant insights into the way LLMs generate references:

Reflection of Human Patterns: The LLMs show a notable resemblance to human citation patterns, albeit with a heightened bias towards highly cited works. This bias is robust and persists even when controlling for a variety of confounding factors, including publication year and venue characteristics.
Consistency Across Models: The patterns observed in GPT-4 are consistent across its different versions (like GPT-4o) and other models such as Claude 3.5, suggesting a systematic bias ingrained within the models due to their training data.
Citation Graph Embedding: The generated references are not randomly distributed but are contextually embedded within the citation graphs relevant to the field. This indicates a deeper conceptual internalization of citation networks by these models.
Bias Towards High Citation: The most striking bias observed is an inclination of LLMs to favor references with a high citation count. This tendency is independent of other features and highlights a potential amplification of the "Matthew effect" in citation dynamics, where prominent papers continue to receive more attention.

Implications

The implications of these findings span both practical and theoretical realms:

Practical Considerations: While LLMs can accelerate academic workflows, especially in generating and recommending citations, the observed biases necessitate cautious deployment in scholarly contexts. Emphasizing or amplifying certain citations can skew the landscape of academic discourse, favoring well-cited over potentially innovative under-cited works.
Theoretical Insights: The work highlights the relationship between LLM training data properties and their output characteristics. It stresses the need for developing mitigation strategies to handle biases in training regimes of future models to avoid perpetuating historical and systemic biases in academic dissemination.

Future Directions

The paper prompts further investigation into several avenues:

Broader Dataset Evaluation: Extending the analysis across diverse datasets could illuminate discipline-specific citation patterns and biases. Such analysis would help in understanding how LLM biases manifest in less homogeneous datasets.
Optimization Techniques: Research into advanced prompt engineering and retrieval-augmented generation could address the citation bias by integrating external databases to provide more balanced reference suggestions.
Bias Mitigation: Implementation of strategies such as online learning adjustments or bias-corrective algorithmic interventions might be necessary to tune LLM outputs more finely to human needs without undesired exaggerations.

The paper underscores the potential of LLMs to both innovate and inadvertently perpetuate existing citation tendencies in academia. As deployment of these models continues to rise, their role in shaping future academic ecosystems must be carefully managed and understood. The paper thus initiates an essential dialogue on navigating the blend of artificial intelligence with traditional scholarly practices.

PDF Markdown

Related Papers

Tweets

https://twitter.com/MishaTeplitskiy/status/1803161279797899426

https://twitter.com/AndresAlgaba1/status/1803863807632064546

https://twitter.com/RetractionWatch/status/1804590399958167951

https://twitter.com/RetractionWatch/status/1800951843368866285

https://twitter.com/PavelProsselkov/status/1802297634104713654

https://twitter.com/ValeryRidde/status/1801885154379501911