Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level (2306.08122v1)
Abstract: The increasing reliance on LLMs in academic writing has led to a rise in plagiarism. Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using NLP techniques, offering quantifiable metrics at both sentence and document levels for easier interpretation by human evaluators. Our method employs a multi-faceted approach, generating multiple paraphrased versions of a given question and inputting them into the LLM to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student's response. Our approach achieves up to 94% accuracy in classifying human and AI text, providing a robust and adaptable solution for plagiarism detection in academic settings. This method improves with LLM advancements, reducing the need for new model training or reconfiguration, and offers a more transparent way of evaluating and detecting AI-generated text.
- Aaditya Bhat. 2023. Gpt-wiki-intro (revision 0e458f5).
- Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
- Character.AI. 2023. Character.ai. https://beta.character.ai/. Accessed on May 15, 2023.
- ChatGPT. 2023. Chatgpt official website. https://openai.com/blog/chatgpt. Accessed on May 15, 2023.
- Yen-Chi Chen. 2017. A tutorial on kernel density estimation and recent advances.
- Fabio Duarte. 2023. Number of chatgpt users (2023). https://explodingtopics.com/blog/chatgpt-users. Accessed on May 15, 2023.
- Geoffrey A. Fowler. 2023. We tested a new chatgpt-detector for teachers. it flagged an innocent student. https://www.washingtonpost.com/technology/2023/04/01/chatgpt-cheating-detection-turnitin/.
- Google. 2023. Bardai. https://blog.google/technology/ai/try-bard/. Accessed on May 15, 2023.
- GPTZero. 2023. Gptzero official website. https://gptzero.me/. Accessed on May 15, 2023.
- How close is chatgpt to human experts? comparison corpus, evaluation, and detection.
- Mgtbench: Benchmarking machine-generated text detection.
- Ai, write an essay for me: A large-scale comparison of human-written versus chatgpt-generated essays.
- Chatgpt – a blessing or a curse for undergraduate computer science students and instructors?
- Mohammad Khalil and Erkan Er. 2023. Will chatgpt get you caught? rethinking of plagiarism detection.
- Detectgpt: Zero-shot machine-generated text detection using probability curvature.
- OpenAI. 2023a. Gpt-4 technical report.
- OpenAI. 2023b. Openai official website. https://openai.com/. Accessed on May 15, 2023.
- Adam Roberts and Colin Raffel. 2020. Exploring transfer learning with T5: the text-to-text transfer transformer. Google AI Blog. Google AI Blog.
- Release strategies and the social impacts of language models.
- The science of detecting llm-generated texts.
- Alaa Tharwat et al. 2017. Linear discriminant analysis: A detailed tutorial.
- Milvus: A purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data, pages 2614–2627.
- Harnessing the power of llms in practice: A survey on chatgpt and beyond.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.