2000 character limit reached
Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection (2103.12450v5)
Published 23 Mar 2021 in cs.CL, cs.AI, and cs.DL
Abstract: The rise of LLMs such as BERT allows for high-quality text paraphrasing. This is a problem to academic integrity, as it is difficult to differentiate between original and machine-generated content. We propose a benchmark consisting of paraphrased articles using recent LLMs relying on the Transformer architecture. Our contribution fosters future research of paraphrase detection systems as it offers a large collection of aligned original and paraphrased documents, a study regarding its structure, classification experiments with state-of-the-art systems, and we make our findings publicly available.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.