Mapping the Increasing Use of LLMs in Scientific Papers (2404.01268v1)
Abstract: Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using LLMs like ChatGPT in their academic writing, and to what extent this tool might have an effect on global scientific practices. However, we lack a precise measure of the proportion of academic writing substantially modified or produced by LLMs. To address this gap, we conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time. Our statistical estimation operates on the corpus level and is more robust than inference on individual instances. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers (up to 17.5%). In comparison, Mathematics papers and the Nature portfolio showed the least LLM modification (up to 6.3%). Moreover, at an aggregate level, our analysis reveals that higher levels of LLM-modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and papers of shorter lengths. Our findings suggests that LLMs are being broadly used in scientific writings.
- Scott Aaronson. Simons Institute Talk on Watermarking of Large Language Models, 2023. URL https://simons.berkeley.edu/talks/scott-aaronson-ut-austin-openai-2023-08-17.
- Natural Language Watermarking: Design, Analysis, and a Proof-of-Concept Implementation. In Information Hiding, 2001.
- Identifying Real or Fake Articles: Towards better Language Modeling. In International Joint Conference on Natural Language Processing, 2008.
- Real or Fake? Learning to Discriminate Machine from Human Generated Text. ArXiv, abs/1906.03351, 2019.
- Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature. ArXiv, abs/2310.05130, 2023.
- Daria Beresneva. Computer-Generated Text Detection Using Machine Learning: A Systematic Review. In International Conference on Applications of Natural Language to Data Bases, 2016.
- Squibs: What Is a Paraphrase? Computational Linguistics, 39:463–472, 2013.
- ConDA: Contrastive Domain Adaptation for AI-generated Text Detection. ArXiv, abs/2309.03992, 2023. URL https://api.semanticscholar.org/CorpusID:261660497.
- On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736, 2023.
- GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content. ArXiv, abs/2305.07969, 2023. URL https://api.semanticscholar.org/CorpusID:258686680.
- Natural Language Watermarking Using Semantic Substitution for Chinese Text. In International Workshop on Digital Watermarking, 2003.
- Gemma Conroy. How ChatGPT and other AI tools could disrupt scientific publishing. Nature, October 2023a. URL https://www.nature.com/articles/d41586-023-03144-w.
- Gemma Conroy. Scientific sleuths spot dishonest ChatGPT use in papers. Nature, September 2023b. URL https://www.nature.com/articles/d41586-023-02477-w.
- Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods. arXiv preprint arXiv:2210.07321, 2022.
- Mack Deguerin. AI-generated nonsense is leaking into scientific journals . Popular Science, March 2024. URL https://www.popsci.com/technology/ai-generated-text-scientific-journals/.
- Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality, 2023. Harvard Business School Technology & Operations Mgt. Unit Working Paper 24-013.
- What’s In My Big Data? In The Twelfth International Conference on Learning Representations, 2023.
- Holly Else. Abstracts written by ChatGPT fool scientists. Nature, Jan 2023. URL https://www.nature.com/articles/d41586-023-00056-7.
- TweepFake: About detecting deepfake tweets. Plos one, 16(5):e0251415, 2021.
- Three Bricks to Consolidate Watermarks for Large Language Models. 2023 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–6, 2023.
- Tradition and innovation in scientists’ research strategies. American sociological review, 80(5):875–908, 2015.
- Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. bioRxiv, pp. 2022–12, 2022.
- The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
- GLTR: Statistical Detection and Visualization of Generated Text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 111–116, 2019.
- Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey. ArXiv, abs/2310.15264, 2023.
- ’Person’== Light-skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion. arXiv preprint arXiv:2310.19981, 2023.
- Melissa Heikkilä. How to spot AI-generated text. MIT Technology Review, Dec 2022. URL https://www.technologyreview.com/2022/12/19/1065596/how-to-spot-ai-generated-text/.
- RADAR: Robust AI-Text Detection via Adversarial Learning. ArXiv, abs/2307.03838, 2023a. URL https://api.semanticscholar.org/CorpusID:259501842.
- Unbiased Watermark for Large Language Models. ArXiv, abs/2310.10669, 2023b.
- ICML. Clarification on large language model policy LLM. https://icml.cc/Conferences/2023/llm-policy, 2023.
- Automatic detection of generated text is easiest when humans are fooled. arXiv preprint arXiv:1911.00650, 2019.
- Automatic detection of machine generated text: A critical survey. arXiv preprint arXiv:2011.01314, 2020.
- Samantha Murphy Kelly. ChatGPT creator pulls AI detection tool due to ‘low rate of accuracy’. CNN Business, Jul 2023. URL https://www.cnn.com/2023/07/25/tech/openai-ai-detection-tool/index.html.
- Recalibrating the scope of scholarly publishing: A modest step in a vast decolonization process. Quantitative Science Studies, 3(4):912–930, 12 2022. ISSN 2641-3337. doi: 10.1162/qss˙a˙00228. URL https://doi.org/10.1162/qss_a_00228.
- A watermark for large language models. International Conference on Machine Learning, 2023.
- New AI classifier for indicating AI-written text, 2023. URL https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text.
- Robust Distortion-free Watermarks for Language Models. ArXiv, abs/2307.15593, 2023.
- Detecting Fake Content with Relative Entropy Scoring. Pan, 2008.
- Deepfake Text Detection in the Wild. ArXiv, abs/2305.13242, 2023. URL https://api.semanticscholar.org/CorpusID:258832454.
- GPT detectors are biased against non-native English writers. ArXiv, abs/2304.02819, 2023a.
- Can large language models provide useful feedback on research papers? A large-scale empirical analysis. arXiv preprint arXiv:2310.01783, 2023b.
- Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews. arXiv preprint arXiv:2403.07183, 2024.
- Reviewergpt? an exploratory study on using large language models for paper reviewing. arXiv preprint arXiv:2306.00622, 2023.
- CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning. ArXiv, abs/2212.10341, 2022. URL https://api.semanticscholar.org/CorpusID:254877728.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv, abs/1907.11692, 2019.
- MacroPolo. The Global AI Talent Tracker, 2024. URL https://macropolo.org/digital-projects/the-global-ai-talent-tracker/.
- DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. ArXiv, abs/2301.11305, 2023a.
- DetectGPT: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305, 2023b.
- Paulina Okunytė. Google search exposes academics using ChatGPT in research papers. Cybernews, November 2023. URL https://cybernews.com/news/academic-cheating-chatgpt-openai/.
- OpenAI. GPT-2: 1.5B release. https://openai.com/research/gpt-2-1-5b-release, 2019. Accessed: 2019-11-05.
- Papers and peer reviews with evidence of ChatGPT writing . Retraction Watch, 2024. URL https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Can AI-Generated Text be Reliably Detected? ArXiv, abs/2303.11156, 2023.
- Whose opinions do language models reflect? In International Conference on Machine Learning, pp. 29971–30004. PMLR, 2023.
- Red Teaming Language Model Detectors with Language Models. ArXiv, abs/2305.19713, 2023.
- The curse of recursion: Training on generated data makes models forget. arXiv preprint arXiv:2305.17493, 2023.
- Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203, 2019.
- H. Holden Thorp. Chatgpt is fun, but not an author. Science, 379(6630):313–313, 2023. doi: 10.1126/science.adg7879. URL https://www.science.org/doi/abs/10.1126/science.adg7879.
- Natural language watermarking: challenges in building a practical system. In Electronic imaging, 2006a.
- The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In Workshop on Multimedia & Security, 2006b.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts. ArXiv, abs/2306.04723, 2023.
- Authorship Attribution for Neural Text Generation. In Conference on Empirical Methods in Natural Language Processing, 2020.
- Dann. Van Rossum. Generative AI Top 150: The World’s Most Used AI Tools. https://www.flexos.work/learn/generative-ai-top-150, February 2024.
- Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks. arXiv preprint arXiv:2306.07899, 2023.
- James Vincent. ‘As an AI language model’: the phrase that shows how AI is pollulating the web. The Verge, Apr 2023. URL https://www.theverge.com/2023/4/25/23697218/ai-generated-spam-fake-user-reviews-as-an-ai-language-model.
- Testing of detection tools for AI-generated text. International Journal for Educational Integrity, 19(1):26, 2023. ISSN 1833-2595. doi: 10.1007/s40979-023-00146-z. URL https://doi.org/10.1007/s40979-023-00146-z.
- Max Wolff. Attacking Neural Text Detectors. ArXiv, abs/2002.11768, 2020.
- DiPmark: A Stealthy, Efficient and Resilient Watermark for Large Language Models. ArXiv, abs/2310.07710, 2023.
- DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text. ArXiv, abs/2305.17359, 2023a.
- A Survey on Detection of LLMs-Generated Content. ArXiv, abs/2310.15654, 2023b.
- Robust Multi-bit Natural Language Watermarking through Invariant Features. In Annual Meeting of the Association for Computational Linguistics, 2023.
- GPT Paternity Test: GPT Generated Text Detection with GPT Genetic Inheritance. ArXiv, abs/2305.12519, 2023. URL https://api.semanticscholar.org/CorpusID:258833423.
- Defending Against Neural Fake News. ArXiv, abs/1905.12616, 2019.
- Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors. ArXiv, abs/2312.12918, 2023.
- Protecting Language Generation Models via Invisible Watermarking. In Proceedings of the 40th International Conference on Machine Learning, pp. 42187–42199, 2023.
- Provable Robust Watermarking for AI-Generated Text. In International Conference on Learning Representations (ICLR), 2024a.
- Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs. arXiv preprint arXiv:2402.05864, 2024b.