Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 82 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 117 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 469 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Divergent Creativity in Humans and Large Language Models (2405.13012v2)

Published 13 May 2024 in cs.CL and cs.AI

Abstract: The recent surge of LLMs has led to claims that they are approaching a level of creativity akin to human capabilities. This idea has sparked a blend of excitement and apprehension. However, a critical piece that has been missing in this discourse is a systematic evaluation of LLMs' semantic diversity, particularly in comparison to human divergent thinking. To bridge this gap, we leverage recent advances in computational creativity to analyze semantic divergence in both state-of-the-art LLMs and a substantial dataset of 100,000 humans. We found evidence that LLMs can surpass average human performance on the Divergent Association Task, and approach human creative writing abilities, though they fall short of the typical performance of highly creative humans. Notably, even the top performing LLMs are still largely surpassed by highly creative individuals, underscoring a ceiling that current LLMs still fail to surpass. Our human-machine benchmarking framework addresses the polemic surrounding the imminent replacement of human creative labour by AI, disentangling the quality of the respective creative linguistic outputs using established objective measures. While prompting deeper exploration of the distinctive elements of human inventive thought compared to those of AI systems, we lay out a series of techniques to improve their outputs with respect to semantic diversity, such as prompt design and hyper-parameter tuning.

Citations (5)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces a benchmarking framework using the Divergent Association Task (DAT) and other metrics to quantitatively compare divergent creativity in humans and large language models (LLMs).
  • The study found that certain LLMs, particularly GPT-4, can match or exceed human performance in semantic creativity tasks like the DAT, significantly influenced by hyperparameter tuning like temperature.
  • Findings suggest that LLMs can achieve creative outputs comparable to humans, highlighting the potential for synergy between artificial and human creativity and the importance of quantifiable assessment.

The paper "Divergent Creativity in Humans and LLMs" presents an extensive evaluation of divergent creativity in both human participants and LLMs, utilizing a framework grounded in creativity science. It addresses pivotal questions about the creative capabilities of LLMs in comparison to human creativity, with a specific focus on quantifying creativity rather than subjective evaluation.

Key Contributions and Methodology

  1. Benchmarking Creativity:
    • The paper introduces a benchmarking framework comparing LLMs and human creativity using the Divergent Association Task (DAT) and other creativity metrics like Divergent Semantic Integration (DSI) and Lempel-Ziv (LZ) complexity.
  2. Divergent Association Task (DAT):
    • DAT requires generating words that are semantically dissimilar. The paper utilizes 100,000 human samples as a reference to evaluate LLMs including models such as GPT-4, GeminiPro, and others.
  3. LLM Tuning and Strategy:
    • The research examines the impact of hyperparameter modification, like temperature adjustments, and prompt engineering on LLM performance. Higher temperatures lead to more diverse outputs with greater creativity scores.
    • Strategies such as citing etymology and resorting to thesaurus keywords significantly enhance task performance.
  4. Comparative Analysis:
    • GPT-4 not only matches but exceeds human creativity scores in certain tasks, notably outperforming other LLMs in the DAT.
    • Differences in lexical choices, as shown by word frequency analyses, highlight contrasting performance levels among LLMs.
    • The analysis includes the exploration of semantic distance, facilitating a succinct comparison between LLMs and human word generation schemes.
  5. Creativity in Writing:
    • The paper further extends analyses to creative writing tasks, such as generating haikus and short narratives. The results indicate that while LLMs achieve high creativity scores, human participants still exhibit superior creativity in these tasks.

Major Findings

  • Performance and Variability:
    • The paper reveals variability in LLM performance based on model size and structure. For instance, smaller models occasionally outperform larger counterparts in specific contexts.
    • Results underscore the substantial influence of LLM architecture and tuning on creativity measures.
  • Semantic Distance and Contextual Embeddings:
    • Techniques like semantic distance via cosine similarity of word embeddings are employed, demonstrating how responses vary in creativity across different models.
    • Different creativity measures capture varying facets of language creativity, with DSI and LZ complementing the DAT findings.
  • Creative Capabilities Beyond Humans:
    • The research corroborates the notion that certain LLMs, notably GPT-4, demonstrate superior performance to humans in semantic creativity, though this does not universally apply to all creative aspects.

Broader Implications

  • The paper posits that LLMs can achieve creative outputs comparable to human creativity but highlights the significance of model-specific tuning and strategic manipulation. It advances the emerging discourse on machine creativity, focusing on quantifiable metrics over subjective assessment.
  • The findings suggest a prospective synergy between LLMs and human creativity, emphasizing the importance of understanding artificial creativity's implications on cognition and innovation.

Limitations and Future Work

  • The research acknowledges limitations regarding the open access to architectural and fine-tuning details of certain LLMs, suggesting that continuous updates and further explorations are essential due to the rapid developments in the field.
  • Future investigations could explore convergent thinking and the integration of subjective human evaluations to complement the quantitative metrics used in this paper, potentially providing a more comprehensive understanding of LLM creativity.

By building this framework, the paper not only evaluates current models but also provides a foundation for future research and the development of more sophisticated creative AI systems.