Emergent Mind

Abstract

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a LLM. Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.

Estimated usage of LLMs in ML conferences spiked post-ChatGPT release, not observed in Nature Portfolio.

Overview

  • This study presents a novel approach using a maximum likelihood estimation model to identify AI-modified text in peer reviews for major AI conferences, revealing an estimated 6.5% to 16.9% of such modification.

  • It finds a correlation between AI-modified content and factors like submission deadlines, lack of scholarly citations, and lower participation in post-review discussions.

  • The research highlights concerns about the potential homogenization of scholarly feedback due to AI-generated content, indicating a reduction in linguistic and epistemic diversity.

  • The findings advocate for more transparent guidelines on AI use in scholarly writing and suggest future research should continue exploring the impact of AI tools like LLM on scientific communication.

Monitoring AI-Modified Content at Scale in the Peer Review Process

Motivation and Approach

Peer reviews are fundamental to the scientific publication process, ensuring the relevance, rigor, and originality of scientific work. The advent of generative AI, like ChatGPT, has introduced potential changes in how reviews are composed, possibly impacting their quality and authenticity. This study introduces a novel framework, leveraging a maximum likelihood model, to estimate the proportion of corpus content likely modified by AI at a large scale. Focusing on peer reviews from major AI conferences post-ChatGPT's release, this research uncovers patterns in AI-generated text use and discusses the broader implications for the peer review ecosystem.

Statistical Estimation Framework

At the core of this study is a maximum likelihood estimation (MLE) approach designed to efficiently discern the extent of AI modification in large text corpora. By comparing known human-written and AI-generated documents, the framework estimates the distribution of texts in a given corpus that resemble either category. A critical aspect of this methodology is its ability to operate without the need for direct analysis of individual documents, making it vastly more computationally efficient and less prone to the biases of existing AI detection tools.

Case Study and Main Findings

The application of this framework to peer reviews from ICLR, NeurIPS, CoRL, and EMNLP conferences reveals significant insights:

  • An estimated 6.5% to 16.9% of review sentences in these conferences were substantially modified by AI.
  • Higher AI modification rates were observed in reviews submitted closer to deadlines, reviews without scholarly citations, and in reviews from authors who engaged less in the post-review discussion phase.
  • A notable correlation between the presence of AI-modified content and reduced linguistic and epistemic diversity in reviews, raising concerns about the homogenization of scholarly feedback.

These findings highlight a nuanced picture of AI use in scientific peer review, pointing to both its potential advantages in aiding reviewers and the risks it poses to the integrity and diversity of scholarly discourse.

Theoretical Implications

This study's theoretical contributions include a robust MLE framework capable of analyzing AI-generated content across large datasets and a detailed case study of its application within the domain of scientific peer review. The methodology provides a generalizable tool for future research into AI's impact across different information ecosystems.

Practical Implications

From a practical standpoint, this research raises important questions about the role of AI in the peer review process. The detected trends in AI use and the associated impact on review content quality and diversity underscore the need for greater transparency and guidelines around AI-assisted writing in scholarly publications. Furthermore, the findings call for interdisciplinary efforts to understand and navigate the evolving landscape of AI-generated content in scientific discourse.

Future Directions

Looking ahead, the study advocates for continued investigation into the broad implications of LLM use in scientific communication. As AI tools become increasingly sophisticated, understanding their effects on scholarly practices, from peer review to research dissemination, will be critical. Collaborative efforts combining computational, ethical, and sociological perspectives are essential to ensure AI's responsible integration into the scientific community.

Conclusion

The exploration of AI-modified content in AI conference peer reviews post-ChatGPT reveals a complex interplay between technology and scientific communication. By providing a scalable and efficient method for estimating AI influence, this study contributes valuable tools and insights for navigating the future of AI in academia, urging careful consideration of its benefits and challenges.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube