Deciphering Textual Authenticity: A Generalized Strategy through the Lens of Large Language Semantics for Detecting Human vs. Machine-Generated Text (2401.09407v3)
Abstract: With the recent proliferation of LLMs, there has been an increasing demand for tools to detect machine-generated text. The effective detection of machine-generated text face two pertinent problems: First, they are severely limited in generalizing against real-world scenarios, where machine-generated text is produced by a variety of generators, including but not limited to GPT-4 and Dolly, and spans diverse domains, ranging from academic manuscripts to social media posts. Second, existing detection methodologies treat texts produced by LLMs through a restrictive binary classification lens, neglecting the nuanced diversity of artifacts generated by different LLMs. In this work, we undertake a systematic study on the detection of machine-generated text in real-world scenarios. We first study the effectiveness of state-of-the-art approaches and find that they are severely limited against text produced by diverse generators and domains in the real world. Furthermore, t-SNE visualizations of the embeddings from a pretrained LLM's encoder show that they cannot reliably distinguish between human and machine-generated text. Based on our findings, we introduce a novel system, T5LLMCipher, for detecting machine-generated text using a pretrained T5 encoder combined with LLM embedding sub-clustering to address the text produced by diverse generators and domains in the real world. We evaluate our approach across 9 machine-generated text systems and 9 domains and find that our approach provides state-of-the-art generalization ability, with an average increase in F1 score on machine-generated text of 19.6\% on unseen generators and domains compared to the top performing existing approaches and correctly attributes the generator of text with an accuracy of 93.6\%.
- AI-Writer. http://ai-writer.com/, 2023.
- ArticleForge. https://www.articleforge.com/, 2023.
- Cohere. https://cohere.com/, 2023.
- dolly-v2-12b. https://huggingface.co/databricks/dolly-v2-12b, 2023.
- google/flan-t5-xl. https://huggingface.co/google/flan-t5-xl, 2023.
- GPT-3.5. https://platform.openai.com/docs/models/gpt-3-5, 2023.
- Hugging Face text2text-generation. https://huggingface.co/models?pipeline_tag=text2text-generation, 2023.
- Introducing ChatGPT. https://openai.com/blog/chatgpt, 2023.
- Kafkai. https://kafkai.com/en/, 2023.
- roberta-base. https://huggingface.co/roberta-base, 2023.
- Twitter. https://twitter.com/, 2023.
- Cifar-10: Knn-based ensemble of classifiers. In 2016 International Conference on Computational Science and Computational Intelligence (CSCI), pages 1192–1195. IEEE, 2016.
- Demystifying neural fake news via linguistic feature-based interpretation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6586–6599, 2022.
- The looming threat of fake and llm-generated linkedin profiles: Challenges and opportunities for detection and prevention. In Proceedings of the 34th ACM Conference on Hypertext and Social Media, pages 1–10, 2023.
- Residual energy-based models for text. The Journal of Machine Learning Research, 22(1):1840–1880, 2021.
- A systematic review on overfitting control in shallow and deep neural networks. Artificial Intelligence Review, pages 1–48, 2021.
- How effectively can machines defend against machine-generated fake news? an empirical study. In Proceedings of the First Workshop on Insights from Negative Results in NLP, pages 48–53, 2020.
- Social network behavior and public opinion manipulation. Journal of Information Security and Applications, 64:103060, 2022.
- Gpt-sentinel: Distinguishing human and chatgpt generated content. arXiv preprint arXiv:2305.07969, 2023.
- Machine-generated text: A comprehensive survey of threat models and detection methods. IEEE Access, 2023.
- Tweepfake: About detecting deepfake tweets. Plos one, 16(5):e0251415, 2021.
- Feature-based detection of automated language models: tackling gpt-2, gpt-3 and grover. PeerJ Computer Science, 7:e443, 2021.
- Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW), pages 50–56. IEEE, 2018.
- Gltr: Statistical detection and visualization of generated text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 111–116, 2019.
- Mgtbench: Benchmarking machine-generated text detection. arXiv preprint arXiv:2303.14822, 2023.
- SK Hong and Tae Young Jang. Lea: meta knowledge-driven self-attentive document embedding for few-shot text classification. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 99–106, 2022.
- Embedtextnet: Dimension reduction with weighted reconstruction and correlation losses for efficient text embedding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9863–9879, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 66–71, 2018.
- Artificial text detection via examining the topology of attention maps. arXiv preprint arXiv:2109.04825, 2021.
- Deep learning. nature, 521(7553):436–444, 2015.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Adversarial prompting for black box foundation models. arXiv preprint arXiv:2302.04237, 2023.
- Detectgpt: Zero-shot machine-generated text detection using probability curvature. In International Conference on Machine Learning (ICML), 2023.
- Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022.
- Text and code embeddings by contrastive pre-training. arXiv preprint arXiv:2201.10005, 2022.
- Identifying computer-generated text using statistical analysis. In 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 1504–1511. IEEE, 2017.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Lambretta: learning to rank for twitter soft moderation. In 2023 IEEE Symposium on Security and Privacy (SP), pages 311–326. IEEE, 2023.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Deepfake text detection: Limitations and opportunities. In 2023 IEEE Symposium on Security and Privacy (SP), pages 1613–1630. IEEE, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Language Models are Unsupervised Multitask Learners. https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf, 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Impact of smote on imbalanced text features for toxic comments classification using rvvc model. IEEE Access, 9:78621–78634, 2021.
- Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
- Do massively pretrained language models make better storytellers? In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 843–861, 2019.
- Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725, 2016.
- Distilling reasoning capabilities into smaller language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7059–7073, 2023.
- Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203, 2019.
- Domain generalization for text classification with memory-based supervised contrastive learning. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6916–6926, 2022.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection. arXiv preprint arXiv:2305.14902, 2023.
- Analogical-a novel benchmark for long text analogy evaluation in large language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3534–3549, 2023.
- Defending against neural fake news. Advances in neural information processing systems, 32, 2019.
- Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv preprint arXiv:2306.04528, 2023.
- Unsupervised energy-based adversarial domain adaptation for cross-domain text classification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1208–1218, 2021.