Papers
Topics
Authors
Recent
2000 character limit reached

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature (2301.11305v2)

Published 26 Jan 2023 in cs.CL and cs.AI

Abstract: The increasing fluency and widespread usage of LLMs highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained LLM (e.g., T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See https://ericmitchell.ai/detectgpt for code, data, and other project information.

Citations (461)

Summary

  • The paper introduces a novel detection method that exploits probability curvature differences, using random perturbations to differentiate machine-generated text from human-written text.
  • It computes log probability discrepancies from minor text perturbations, achieving significant AUROC improvements over baseline zero-shot approaches.
  • Empirical results on models like GPT-2, GPT-Neo, and GPT-J demonstrate DetectGPT's robustness and efficiency in real-world authenticity verification.

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature (arXiv (2301.11305))

Introduction

The paper introduces DetectGPT, a model leveraging curvature-based criteria for differentiating machine-generated text from human-written content. Distinctively, it operates without the need for additional classifiers or datasets, focusing purely on the log probability distributions of LLM outputs. Central to the method is the observation that text generated by LLMs tends to reside in negative curvature regions of the model's log probability function, an insight that DetectGPT exploits using random perturbations.

Curvature of Log Probability Function

DetectGPT's foundation lies in the hypothesis that machine-generated text, when subjected to minor perturbations, shows a significant decrease in log probability in comparison to the original text. This is contrasted with human-written text, whose perturbations do not exhibit such systematic behavior. The model relies on the local structure surrounding candidate passages, utilizing a perturbation function to approximate the trace of the log probability's Hessian. Figure 1

Figure 1: We determine if a text was generated by an LLM by averaging log probability ratios of original and perturbed samples.

DetectGPT Algorithm

The algorithm computes perturbation discrepancies by averaging log probabilities of perturbed samples from another LLM. If this discrepancy is sufficiently large, it indicates machine-generated text. DetectGPT is thus able to perform zero-shot detection profoundly more effectively than baseline methods, improving the AUROC significantly over existing techniques.

Empirical Validation

The authors substantiate their claims through rigorous experiments across multiple datasets and LLMs like GPT-2, GPT-Neo, and GPT-J. DetectGPT outperforms these models' zero-shot detection mechanisms, achieving higher AUROC scores and maintaining robustness even when applied to large models like GPT-3. Figure 2

Figure 2: Distributions of perturbation discrepancies are consistently higher for model-generated texts than for human texts, indicating a detectable characteristic.

Performance and Implications

DetectGPT showcases enhanced detection capabilities over supervised detectors, particularly in out-of-domain scenarios, underscoring its practicality in varied real-world applications. The paper discusses the trade-off between detection accuracy and computational efficiency, addressing potential costs by optimizing the number of perturbations used.

Discussion and Future Work

The research highlights its potential in conjunction with watermarking techniques for increasing the robustness of detection. Future directions include exploring detection mechanisms across non-textual generative models, such as those producing multimedia content. Figure 3

Figure 3: Performance degradation of top methods under increased revisions, where DetectGPT robustly maintains accuracy.

Conclusion

DetectGPT offers a novel and efficient approach to machine-generated text detection, using insights from probability functions without relying on additional training data or models. It sets a foundation for further exploration into generative model behavior, promising significant impact across domains reliant on authenticity verification. Figure 4

Figure 4: Illustration of DetectGPT's architecture and its procedural flow.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 4 likes about this paper.