Investigating Wit, Creativity, and Detectability of Large Language Models in Domain-Specific Writing Style Adaptation of Reddit's Showerthoughts (2405.01660v1)

Published 2 May 2024 in cs.CL and cs.AI

Abstract: Recent LLMs have shown the ability to generate content that is difficult or impossible to distinguish from human writing. We investigate the ability of differently-sized LLMs to replicate human writing style in short, creative texts in the domain of Showerthoughts, thoughts that may occur during mundane activities. We compare GPT-2 and GPT-Neo fine-tuned on Reddit data as well as GPT-3.5 invoked in a zero-shot manner, against human-authored texts. We measure human preference on the texts across the specific dimensions that account for the quality of creative, witty texts. Additionally, we compare the ability of humans versus fine-tuned RoBERTa classifiers to detect AI-generated texts. We conclude that human evaluators rate the generated texts slightly worse on average regarding their creative quality, but they are unable to reliably distinguish between human-written and AI-generated texts. We further provide a dataset for creative, witty text generation based on Reddit Showerthoughts posts.

Citations (5)

View on Semantic Scholar

Summary

The paper demonstrates that fine-tuned LLMs like GPT-Neo and even zero-shot GPT-3.5 produce texts nearly indistinguishable from human writing.
It employs both human evaluators and RoBERTa classifiers to assess creativity, wit, and the detectability of AI-generated content.
The study highlights practical implications for scalable creative content generation while underscoring risks such as misinformation and spam.

Understanding AI's Ability to Mimic Human Creativity in Text: Insights from LLMs on Reddit Showerthoughts

Introduction: The Quest for Creative AI Texts

In the ever-evolving world of AI, mastering the art of creative and engaging text generation has been a significant challenge. The paper explored how various sizes of LLMs such as GPT-2, GPT-Neo, and GPT-3.5 could emulate human-like creativity, wit, and humor as demonstrated in Reddit's Showerthoughts community. This intriguing subreddit captures brief and clever musings that often arise during mundane tasks, making it an ideal testbed for evaluating the subtleties of AI-generated text.

Experiment Setup

The researchers undertook a comprehensive approach, which included:

Fine-tuning LLMs: GPT-2 and GPT-Neo were fine-tuned on a curated dataset from Showerthoughts, aiming to capture the unique style and creativity of the community's posts.
Zero-shot Text Generation with GPT-3.5: Leveraging a larger model without specific training on Showerthoughts to see how well it could adapt to generating similar content.
Comparative Analysis: A blend of analyses was conducted comparing human-written texts to AI-generated ones, assessing them on creativity, humor, cleverness, and overall quality.
Human vs. AI Detection: An interesting part of the paper tested whether human evaluators or machine learning classifiers (fine-tuned RoBERTa models) were better at distinguishing between AI-generated and human-written texts.

Results and Observations

AI Performance in Generating Creative Text

LLMs managed to produce texts that were tough for human evaluators to distinguish from those written by humans. This reveals both the power and potential pitfalls of using AI in content creation.
GPT-3.5, even without specific fine-tuning, generated high-quality outputs, though fine-tuned models like GPT-Neo performed marginally better in aligning closely with the traits of successful Showerthoughts.

Detection of AI-Written Texts

Human Evaluators: Participants struggled to consistently identify AI vs. human-written texts, underscored by their performance which was only slightly better than random guessing in some cases.
Machine Learning Classifiers: RoBERTa classifiers outperformed humans, accurately classifying the origin of the texts with notable precision. This highlights how machine learning can be instrumental in identifying AI-generated content.

Theoretical and Practical Implications

From a theoretical perspective, this research stretches our understanding of AI's capability in mimicking human creativity; a dimension of AI that is fascinating yet fraught with challenges. The nuanced capability of LLMs to generate contextually rich and varied text, as illustrated with Reddit Showerthoughts, serves as a promising arena for further exploration in AI-generated content.

On a practical level, these findings are particularly relevant for fields like digital marketing, entertainment, or any domain reliant on creative content generation. Businesses could leverage these models to generate innovative and relatable content at scale. However, the potential misuse through generating misinformation or spam accentuates the need for robust detection mechanisms.

Speculations on Future AI Developments

The trajectory of LLM research hints at even more sophisticated models that could offer greater creativity and nuance in text generation. Future studies might explore even finer aspects of humor and wit or extend to different forms of creative expression such as poetry or prose.

Moreover, as AI-generated content becomes harder to detect, the development of more advanced detection tools will be crucial. These tools would not only need to keep pace with AI capabilities but also ensure they are adaptable to new, unforeseen AI methods of text generation.

Conclusion

The exploration of AI's ability to replicate human-like creativity in texts reveals both exciting possibilities and notable challenges. While AI can now generate text that mirrors human thought processes to a remarkable degree, distinguishing these AI-generated texts from human-written ones remains a significant hurdle. This dual-edged sword of AI capabilities assures that the journey of understanding and leveraging AI in creative domains is bound to be a dynamic and ongoing endeavor.