Emergent Mind

As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli

(2403.16760)
Published Mar 25, 2024 in cs.HC , cs.AI , cs.SD , and eess.AS

Abstract

As synthetic media becomes progressively more realistic and barriers to using it continue to lower, the technology has been increasingly utilized for malicious purposes, from financial fraud to nonconsensual pornography. Today, the principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake. However, it remains unclear just how vulnerable people actually are to deceptive synthetic media in the course of their day to day lives. We conducted a perceptual study with 1276 participants to assess how accurate people were at distinguishing synthetic images, audio only, video only, and audiovisual stimuli from authentic. To reflect the circumstances under which people would likely encounter synthetic media in the wild, testing conditions and stimuli emulated a typical online platform, while all synthetic media used in the survey was sourced from publicly accessible generative AI technology. We find that overall, participants struggled to meaningfully discern between synthetic and authentic content. We also find that detection performance worsens when the stimuli contains synthetic content as compared to authentic content, images featuring human faces as compared to non face objects, a single modality as compared to multimodal stimuli, mixed authenticity as compared to being fully synthetic for audiovisual stimuli, and features foreign languages as compared to languages the observer is fluent in. Finally, we also find that prior knowledge of synthetic media does not meaningfully impact their detection performance. Collectively, these results indicate that people are highly susceptible to being tricked by synthetic media in their daily lives and that human perceptual detection capabilities can no longer be relied upon as an effective counterdefense.

Overview

  • This paper analyzes human ability to distinguish between real and AI-generated media, including images, videos, and audio, through a survey of 1276 participants.

  • Findings suggest an average detection accuracy of 51.2%, essentially equating to chance, with variations influenced by content type, modality, and language familiarity.

  • Detection rates were higher for audiovisual stimuli and content in familiar languages, but not significantly affected by self-reported familiarity with synthetic media.

  • The study highlights the limitations of relying solely on human perception to combat synthetic media deceptions, suggesting a need for enhanced education and technological countermeasures.

Human Detection Performance in Identifying Synthetic Media: An Insightful Investigation

Introduction

In recent years, the proliferation of generative AI technologies has significantly enhanced the realism of synthetic media, which includes images, videos, and audio. This advancement raises profound concerns regarding the application of these technologies for malicious purposes such as disinformation campaigns, financial fraud, and privacy violations. Traditionally, the primary defense against such deceptions has been the human capacity to recognize artificial constructs. The study conducted by Di Cooke, Abigail Edwards, Sophia Barkoff, and Kathryn Kelly focuses on evaluating this capacity by analyzing human detection performance across various media types, specifically images, audio-only, video-only, and audiovisual stimuli.

Methodology

The study employed a perceptual survey series involving 1276 participants to evaluate the effectiveness of human detection in differentiating between authentic and synthetic media. Conditions were designed to simulate the typical online environment, utilizing publicly accessible generative AI technologies to generate synthetic media, ensuring the stimuli's relevance to what individuals might encounter in everyday online interactions. This approach provides a more realistic assessment of human detection capabilities 'in the wild'. The research explored the influence of media type, authenticity, subject matter, modality, and language familiarity on detection rates.

Findings

Key findings from the study revealed:

  • Overall Detection Performance: On average, participants correctly identified synthetic content 51.2% of the time, indicating an almost chance-level accuracy.
  • Influence of Synthetic Content and Human Faces: Detection performance decreased with synthetic content and when images featured human faces, underscoring the challenge in discerning AI-generated human likenesses.
  • Comparison across Modalities: Audiovisual stimuli led to higher detection accuracy than single-modality stimuli, highlighting the additive benefit of multimodal information in discernment tasks.
  • Impact of Language Familiarity: Detection success increased when stimuli included languages participants were fluent in, emphasizing the role of language familiarity in synthetic media detection.
  • Prior Knowledge of Synthetic Media: Interestingly, participants' self-reported familiarity with synthetic media did not correlate with detection performance, suggesting either a general inadequacy in public knowledge or the sophistication of synthetic media rendering perceptual cues ineffective.

Implications

This study underscores the limitations of human perceptual capabilities as a standalone defense against synthetic media deceptions. Despite the participants' almost incidental success rate, the nuances in detection performance across different media types and under varying conditions provide critical insights. For instance, the observed variability based on modality and language suggests potential avenues for enhancing education and training programs focused on synthetic media identification. However, the consistent challenge across scenarios points to an urgent need for developing sophisticated, non-perceptual countermeasures. These could include advanced machine learning detectors, blockchain-based content authentication, and comprehensive digital literacy initiatives aimed at fostering critical analysis of online content.

Future Outlook

Given the rapid advancement in AI and machine learning technologies, the already narrow gap between genuine and synthetic media is likely to diminish further. This trajectory implies that human detection capabilities, without the aid of technological tools, may become increasingly insufficient. Future research should, therefore, emphasize not only improving our understanding of human perception under digital duress but also expediting the development of reliable, scalable, and user-friendly technologies to aid in the identification of synthetic media. Moreover, the exploration of educational interventions that can adapt to the evolving landscape of digital content creation will be critical in empowering individuals to navigate the complexities of modern media consumption safely and responsibly.

The collective results from this study paint a sobering picture of the current state of human vulnerability to synthetic media deceptions. They call into action a multi-faceted approach, blending technological, educational, and regulatory efforts, to safeguard individuals and societies against the potential harms of increasingly indistinguishable synthetic content.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.