Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text (2401.09407v3)

Published 17 Jan 2024 in cs.CL and cs.LG

Abstract: With the recent proliferation of LLMs, there has been an increasing demand for tools to detect machine-generated text. The effective detection of machine-generated text face two pertinent problems: First, they are severely limited in generalizing against real-world scenarios, where machine-generated text is produced by a variety of generators, including but not limited to GPT-4 and Dolly, and spans diverse domains, ranging from academic manuscripts to social media posts. Second, existing detection methodologies treat texts produced by LLMs through a restrictive binary classification lens, neglecting the nuanced diversity of artifacts generated by different LLMs. In this work, we undertake a systematic study on the detection of machine-generated text in real-world scenarios. We first study the effectiveness of state-of-the-art approaches and find that they are severely limited against text produced by diverse generators and domains in the real world. Furthermore, t-SNE visualizations of the embeddings from a pretrained LLM's encoder show that they cannot reliably distinguish between human and machine-generated text. Based on our findings, we introduce a novel system, T5LLMCipher, for detecting machine-generated text using a pretrained T5 encoder combined with LLM embedding sub-clustering to address the text produced by diverse generators and domains in the real world. We evaluate our approach across 9 machine-generated text systems and 9 domains and find that our approach provides state-of-the-art generalization ability, with an average increase in F1 score on machine-generated text of 19.6\% on unseen generators and domains compared to the top performing existing approaches and correctly attributes the generator of text with an accuracy of 93.6\%.

References (58)

Citations (1)

View on Semantic Scholar

Summary

The paper presents T5LLMCipher, a detection method that uses T5 encoder embeddings and sub-clustering to differentiate machine-generated text.
It improves detection by an average of 19.6% and achieves 93.6% accuracy in attributing text to specific generators across nine domains.
The approach demonstrates robust resistance to adversarial attacks, paving the way for more reliable text authenticity analysis in varied real-world scenarios.

Overview of the Paper

The expansion of LLMs like GPT-3 and its ilk has revolutionized language processing, producing text that's often indistinguishable from human writing. This advancement creates a pressing need for systems that can identify whether the text was written by a human or generated by a machine. Existing detection methods, however, struggle with the diversity of text generators and domains encountered in real-world contexts. This paper presents a critical analysis of these limitations and introduces T5LLMCipher, a new system designed to improve the detection of machine-generated text. It combines a pretrained T5 encoder with a novel approach that uses embeddings sub-clustering. The system demonstrated superior capabilities, outperforming state-of-the-art methods when tested across various LLMs and content domains.

State-of-the-Art Limitations & Proposed Approach

State-of-the-art methods for detecting machine-generated text often fall short in real-world applications. They are generally limited by two significant issues - firstly, their inability to generalize across the wide array of generators and domains, and secondly, their oversimplification of the problem to a binary classification task, ignoring nuanced differences between generators. To address these issues, the authors propose T5LLMCipher. This system applies the embeddings from a pretrained T5 encoder to create a detection mechanism that can accurately identify and attribute machine-generated text to its respective generators, thereby recognizing specific 'fingerprints' unique to different text-producing LLMs.

Insights from Embedding Analysis

The core of the system is informed by the analysis of embeddings—high-dimensional representations of text content generated from an existing LLM encoder. These embeddings can capture the linguistic nuances and distinct features that differentiate human from machine-generated text. Through a technique known as t-SNE visualization, a sort of text mapping, the authors found that machine-generated text does bear identifiable characteristics that can be quantitatively discerned. This discovery was key in designing a system that can not only detect but also attribute the text to particular generators effectively.

Validation and Results

Comprehensive testing was conducted to validate the new system. T5LLMCipher was tasked with identifying machine-generated text within nine different text domains against nine machine text generators. The evaluation revealed that T5LLMCipher improved detection by an average of 19.6% compared to existing approaches and achieved an impressive 93.6% accuracy in attributing the generator of text. Furthermore, the system demonstrated resilience against adversarial attacks aimed at bypassing detection mechanisms, a scenario increasingly relevant as machine-generated text becomes more prevalent and sophisticated.

In summary, the research confirms that while the current state-of-the-art detectors are limited in their practical application, the innovative use of LLM encoder embeddings presents a promising avenue for accurately detecting and classifying machine-generated text in a variety of real-world scenarios. The T5LLMCipher stands as a substantial advancement, bringing us closer to effectively discerning the authenticity of digital content in an era distinguished by machine learning's growing influence on text creation.

PDF Markdown