Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text (2401.09407v3)
Abstract: With the recent proliferation of LLMs, there has been an increasing demand for tools to detect machine-generated text. The effective detection of machine-generated text face two pertinent problems: First, they are severely limited in generalizing against real-world scenarios, where machine-generated text is produced by a variety of generators, including but not limited to GPT-4 and Dolly, and spans diverse domains, ranging from academic manuscripts to social media posts. Second, existing detection methodologies treat texts produced by LLMs through a restrictive binary classification lens, neglecting the nuanced diversity of artifacts generated by different LLMs. In this work, we undertake a systematic study on the detection of machine-generated text in real-world scenarios. We first study the effectiveness of state-of-the-art approaches and find that they are severely limited against text produced by diverse generators and domains in the real world. Furthermore, t-SNE visualizations of the embeddings from a pretrained LLM's encoder show that they cannot reliably distinguish between human and machine-generated text. Based on our findings, we introduce a novel system, T5LLMCipher, for detecting machine-generated text using a pretrained T5 encoder combined with LLM embedding sub-clustering to address the text produced by diverse generators and domains in the real world. We evaluate our approach across 9 machine-generated text systems and 9 domains and find that our approach provides state-of-the-art generalization ability, with an average increase in F1 score on machine-generated text of 19.6\% on unseen generators and domains compared to the top performing existing approaches and correctly attributes the generator of text with an accuracy of 93.6\%.
- AI-Writer. http://ai-writer.com/, 2023.
- ArticleForge. https://www.articleforge.com/, 2023.
- Cohere. https://cohere.com/, 2023.
- dolly-v2-12b. https://huggingface.co/databricks/dolly-v2-12b, 2023.
- google/flan-t5-xl. https://huggingface.co/google/flan-t5-xl, 2023.
- GPT-3.5. https://platform.openai.com/docs/models/gpt-3-5, 2023.
- Hugging Face text2text-generation. https://huggingface.co/models?pipeline_tag=text2text-generation, 2023.
- Introducing ChatGPT. https://openai.com/blog/chatgpt, 2023.
- Kafkai. https://kafkai.com/en/, 2023.
- roberta-base. https://huggingface.co/roberta-base, 2023.
- Twitter. https://twitter.com/, 2023.
- Cifar-10: Knn-based ensemble of classifiers. In 2016 International Conference on Computational Science and Computational Intelligence (CSCI), pages 1192–1195. IEEE, 2016.
- Demystifying neural fake news via linguistic feature-based interpretation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6586–6599, 2022.
- The looming threat of fake and llm-generated linkedin profiles: Challenges and opportunities for detection and prevention. In Proceedings of the 34th ACM Conference on Hypertext and Social Media, pages 1–10, 2023.
- Residual energy-based models for text. The Journal of Machine Learning Research, 22(1):1840–1880, 2021.
- A systematic review on overfitting control in shallow and deep neural networks. Artificial Intelligence Review, pages 1–48, 2021.
- How effectively can machines defend against machine-generated fake news? an empirical study. In Proceedings of the First Workshop on Insights from Negative Results in NLP, pages 48–53, 2020.
- Social network behavior and public opinion manipulation. Journal of Information Security and Applications, 64:103060, 2022.
- Gpt-sentinel: Distinguishing human and chatgpt generated content. arXiv preprint arXiv:2305.07969, 2023.
- Machine-generated text: A comprehensive survey of threat models and detection methods. IEEE Access, 2023.
- Tweepfake: About detecting deepfake tweets. Plos one, 16(5):e0251415, 2021.
- Feature-based detection of automated language models: tackling gpt-2, gpt-3 and grover. PeerJ Computer Science, 7:e443, 2021.
- Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW), pages 50–56. IEEE, 2018.
- Gltr: Statistical detection and visualization of generated text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 111–116, 2019.
- Mgtbench: Benchmarking machine-generated text detection. arXiv preprint arXiv:2303.14822, 2023.
- SK Hong and Tae Young Jang. Lea: meta knowledge-driven self-attentive document embedding for few-shot text classification. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 99–106, 2022.
- Embedtextnet: Dimension reduction with weighted reconstruction and correlation losses for efficient text embedding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9863–9879, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, 2018.
- Artificial text detection via examining the topology of attention maps. arXiv preprint arXiv:2109.04825, 2021.
- Deep learning. nature, 521(7553):436–444, 2015.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Adversarial prompting for black box foundation models. arXiv preprint arXiv:2302.04237, 2023.
- Detectgpt: Zero-shot machine-generated text detection using probability curvature. In International Conference on Machine Learning (ICML), 2023.
- Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022.
- Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005, 2022.
- Identifying computer-generated text using statistical analysis. In 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 1504–1511. IEEE, 2017.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Lambretta: learning to rank for twitter soft moderation. In 2023 IEEE Symposium on Security and Privacy (SP), pages 311–326. IEEE, 2023.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Deepfake text detection: Limitations and opportunities. In 2023 IEEE Symposium on Security and Privacy (SP), pages 1613–1630. IEEE, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf, 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Impact of smote on imbalanced text features for toxic comments classification using rvvc model. IEEE Access, 9:78621–78634, 2021.
- Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
- Do massively pretrained language models make better storytellers? In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 843–861, 2019.
- Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725, 2016.
- Distilling reasoning capabilities into smaller language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073, 2023.
- Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203, 2019.
- Domain generalization for text classification with memory-based supervised contrastive learning. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6916–6926, 2022.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection. arXiv preprint arXiv:2305.14902, 2023.
- Analogical-a novel benchmark for long text analogy evaluation in large language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3534–3549, 2023.
- Defending against neural fake news. Advances in neural information processing systems, 32, 2019.
- Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv preprint arXiv:2306.04528, 2023.
- Unsupervised energy-based adversarial domain adaptation for cross-domain text classification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1208–1218, 2021.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.