Delving into LLM-assisted writing in biomedical publications through excess vocabulary (2406.07016v5)

Published 11 Jun 2024 in cs.CL, cs.AI, cs.CY, cs.DL, and cs.SI

Abstract: LLMs like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists use them for their scholarly writing. But how wide-spread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: we study vocabulary changes in over 15 million biomedical abstracts from 2010--2024 indexed by PubMed, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the Covid pandemic.

Citations (16)

View on Semantic Scholar

Summary

The paper reveals a post-2022 surge in excess vocabulary, demonstrating ChatGPT’s transformative impact on academic writing.
It employs a novel large-scale excess word analysis on over 14 million PubMed abstracts from 2010 to 2024 to quantify LLM influence.
Results show variability across disciplines with up to 30% of abstracts in some fields displaying LLM traits, highlighting ethical and methodological challenges.

Delving into ChatGPT Usage in Academic Writing Through Excess Vocabulary

The paper authored by Dmitry Kobak, Rita Gonzalez-Marquez, and Emőke-Agnes Horvát investigates the unprecedented impact of LLMs, specifically ChatGPT, on scientific writing by analyzing shifts in vocabulary within PubMed abstracts. This work employs a novel, data-driven approach, free of ground-truth assumptions, to uncover the extent of LLM usage.

Summary of Findings

Kobak et al. analyzed over 14 million PubMed abstracts from 2010 to 2024 to quantify changes in vocabulary and inferred the influence of LLMs. The analysis revealed an abrupt increase in certain style words following the release of ChatGPT in late 2022. Specifically, the 2024 corpus exhibited an unprecedented quantity of excess vocabulary, suggesting at least 10% of the abstracts were processed with LLMs, a conservative lower-bound estimate. This percentage varied significantly across disciplines, countries, and journals, reaching as high as 30% in some cases.

Methodology

The researchers utilized a large-scale approach based on tracking "excess words" - vocabulary showing significant increases in usage frequency post-LLM availability. This method is inspired by the concept of excess mortality used during the COVID-19 pandemic. By comparing observed 2024 word frequencies with counterfactual projections based on pre-LLM years (2021-2022), they identified words with large frequency gaps.

Key Results

Key findings of this paper include:

Excess Style Words: The analysis identified hundreds of style words whose frequency abruptly increased in 2024. These included verbs and adjectives such as "delves," "showcasing," "crucial," and "pivotal."
Quantitative Impact: The overall lower bound for LLM-processed abstracts in 2024 was estimated at 10%, reaching up to 30% in computational fields and certain geographical regions.
Heterogeneity: There were significant variations in LLM adoption rates among different countries, fields, and journals. For instance, journals with simplified review processes, like those by MDPI, showed much higher LLM usage.
Comparative Analysis: The vocabulary shifts related to LLMs surpassed even those seen during the COVID-19 pandemic in terms of style rather than content-related words.

Implications

Practical Implications

The paper underscores the substantial and growing influence of LLMs in scientific writing. These models, while improving grammatical correctness and readability, can propagate biases, fabricate information, and produce less diverse and innovative content. Moreover, the detection of such widespread LLM usage, even in high-prestige journals, calls into question the integrity of current publication practices.

Theoretical Implications

From a theoretical perspective, the unprecedented shifts in writing style imposed by LLMs highlight the emergent capabilities and integration of artificial intelligence in academic practices. This trend stresses the importance of developing robust methods to detect and monitor AI-generated content in scholarly work. It also suggests a potential future where AI not only assists but co-authors research, raising ethical and practical challenges regarding authorship and accountability.

Future Prospects

Given the trends observed, future research could explore more sophisticated detection mechanisms for LLM usage, considering the rapid advancements in AI technologies. Additionally, there is a need for ongoing monitoring to reassess the extent of LLM impact as their adoption continues to grow. Policy changes advocating for transparency and responsible AI usage in academic contexts are essential to mitigate potential risks while harnessing the benefits of these powerful tools.

Conclusion

Kobak et al. have highlighted an important transition in academic writing precipitated by the adoption of LLMs. Their pioneering approach offers a robust framework for tracking and understanding this shift, providing critical insights for both the academic community and policymakers. As LLMs continue to evolve, their influence on the scientific literature will likely intensify, necessitating vigilant and adaptive strategies to balance innovation with integrity in scholarly communication.

This paper serves as a foundational analysis for understanding the nuanced impacts of LLMs on academic writing, setting the stage for future work to explore and address the challenges and opportunities posed by AI in academia.