As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli (2403.16760v3)

Published 25 Mar 2024 in cs.HC, cs.AI, cs.SD, and eess.AS

Abstract: As synthetic media becomes progressively more realistic and barriers to using it continue to lower, the technology has been increasingly utilized for malicious purposes, from financial fraud to nonconsensual pornography. Today, the principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake. However, it remains unclear just how vulnerable people actually are to deceptive synthetic media in the course of their day to day lives. We conducted a perceptual study with 1276 participants to assess how accurate people were at distinguishing synthetic images, audio only, video only, and audiovisual stimuli from authentic. To reflect the circumstances under which people would likely encounter synthetic media in the wild, testing conditions and stimuli emulated a typical online platform, while all synthetic media used in the survey was sourced from publicly accessible generative AI technology. We find that overall, participants struggled to meaningfully discern between synthetic and authentic content. We also find that detection performance worsens when the stimuli contains synthetic content as compared to authentic content, images featuring human faces as compared to non face objects, a single modality as compared to multimodal stimuli, mixed authenticity as compared to being fully synthetic for audiovisual stimuli, and features foreign languages as compared to languages the observer is fluent in. Finally, we also find that prior knowledge of synthetic media does not meaningfully impact their detection performance. Collectively, these results indicate that people are highly susceptible to being tricked by synthetic media in their daily lives and that human perceptual detection capabilities can no longer be relied upon as an effective counterdefense.

Citations (12)

View on Semantic Scholar

Summary

The paper shows that participants correctly identified synthetic content only 51.2% of the time, indicating near coin toss accuracy.
The study finds that audiovisual stimuli and language familiarity significantly enhance detection performance compared to single modality cues.
The paper reveals that self-reported familiarity with synthetic media does not improve detection, underscoring the need for advanced countermeasures.

Human Detection Performance in Identifying Synthetic Media: An Insightful Investigation

Introduction

In recent years, the proliferation of generative AI technologies has significantly enhanced the realism of synthetic media, which includes images, videos, and audio. This advancement raises profound concerns regarding the application of these technologies for malicious purposes such as disinformation campaigns, financial fraud, and privacy violations. Traditionally, the primary defense against such deceptions has been the human capacity to recognize artificial constructs. The paper conducted by Di Cooke, Abigail Edwards, Sophia Barkoff, and Kathryn Kelly focuses on evaluating this capacity by analyzing human detection performance across various media types, specifically images, audio-only, video-only, and audiovisual stimuli.

Methodology

The paper employed a perceptual survey series involving 1276 participants to evaluate the effectiveness of human detection in differentiating between authentic and synthetic media. Conditions were designed to simulate the typical online environment, utilizing publicly accessible generative AI technologies to generate synthetic media, ensuring the stimuli's relevance to what individuals might encounter in everyday online interactions. This approach provides a more realistic assessment of human detection capabilities 'in the wild'. The research explored the influence of media type, authenticity, subject matter, modality, and language familiarity on detection rates.

Findings

Key findings from the paper revealed:

Overall Detection Performance: On average, participants correctly identified synthetic content 51.2% of the time, indicating an almost chance-level accuracy.
Influence of Synthetic Content and Human Faces: Detection performance decreased with synthetic content and when images featured human faces, underscoring the challenge in discerning AI-generated human likenesses.
Comparison across Modalities: Audiovisual stimuli led to higher detection accuracy than single-modality stimuli, highlighting the additive benefit of multimodal information in discernment tasks.
Impact of Language Familiarity: Detection success increased when stimuli included languages participants were fluent in, emphasizing the role of language familiarity in synthetic media detection.
Prior Knowledge of Synthetic Media: Interestingly, participants' self-reported familiarity with synthetic media did not correlate with detection performance, suggesting either a general inadequacy in public knowledge or the sophistication of synthetic media rendering perceptual cues ineffective.

Implications

This paper underscores the limitations of human perceptual capabilities as a standalone defense against synthetic media deceptions. Despite the participants' almost incidental success rate, the nuances in detection performance across different media types and under varying conditions provide critical insights. For instance, the observed variability based on modality and language suggests potential avenues for enhancing education and training programs focused on synthetic media identification. However, the consistent challenge across scenarios points to an urgent need for developing sophisticated, non-perceptual countermeasures. These could include advanced machine learning detectors, blockchain-based content authentication, and comprehensive digital literacy initiatives aimed at fostering critical analysis of online content.

Future Outlook

Given the rapid advancement in AI and machine learning technologies, the already narrow gap between genuine and synthetic media is likely to diminish further. This trajectory implies that human detection capabilities, without the aid of technological tools, may become increasingly insufficient. Future research should, therefore, emphasize not only improving our understanding of human perception under digital duress but also expediting the development of reliable, scalable, and user-friendly technologies to aid in the identification of synthetic media. Moreover, the exploration of educational interventions that can adapt to the evolving landscape of digital content creation will be critical in empowering individuals to navigate the complexities of modern media consumption safely and responsibly.

The collective results from this paper paint a sobering picture of the current state of human vulnerability to synthetic media deceptions. They call into action a multi-faceted approach, blending technological, educational, and regulatory efforts, to safeguard individuals and societies against the potential harms of increasingly indistinguishable synthetic content.

PDF Markdown

Related Papers

Tweets

https://twitter.com/serrjoa/status/1772527129336418730

https://twitter.com/AudioAndSpeech/status/1773013913543540866

https://twitter.com/ArxivSound/status/1772475612051001649

https://twitter.com/AudioAndSpeech/status/1772597168064377219

https://twitter.com/AudioAndSpeech/status/1776207457598976390