AI-Generated Video Detection via Spatio-Temporal Anomaly Learning (2403.16638v1)

Published 25 Mar 2024 in cs.CV and cs.CR

Abstract: The advancement of generation models has led to the emergence of highly realistic AI-generated videos. Malicious users can easily create non-existent videos to spread false information. This letter proposes an effective AI-generated video detection (AIGVDet) scheme by capturing the forensic traces with a two-branch spatio-temporal convolutional neural network (CNN). Specifically, two ResNet sub-detectors are learned separately for identifying the anomalies in spatical and optical flow domains, respectively. Results of such sub-detectors are fused to further enhance the discrimination ability. A large-scale generated video dataset (GVD) is constructed as a benchmark for model training and evaluation. Extensive experimental results verify the high generalization and robustness of our AIGVDet scheme. Code and dataset will be available at https://github.com/multimediaFor/AIGVDet.

References (21)

Citations (1)

View on Semantic Scholar

Summary

The paper proposes a two-branch CNN architecture that fuses spatial and optical flow anomaly detection to accurately identify AI-generated videos.
The paper demonstrates high generalization and robustness through extensive experiments on a large-scale generated video dataset.
The paper highlights practical implications by enabling platforms and regulators to combat misinformation via reproducible, transparent detection methods.

The paper "AI-Generated Video Detection via Spatio-Temporal Anomaly Learning" addresses the challenge of detecting AI-generated videos, which have become increasingly realistic due to advancements in generative models. These videos present a potential risk as they can be used to spread misinformation. The authors propose an AI-generated video detection scheme named AIGVDet that identifies forensic traces using a two-branch spatio-temporal convolutional neural network (CNN).

Key Components:

Two-Branch CNN Architecture: The detection system employs two separate ResNet-based sub-detectors. One focuses on spatial anomalies, while the other examines optical flow anomalies. Spatial anomalies refer to inconsistencies in the static frames of the video, whereas optical flow anomalies analyze movement and temporal inconsistencies across frames.
Fusion of Detection Results: The results from the spatial and optical flow sub-detectors are fused to enhance the system's discrimination capability. This integration leverages the strengths of both branches to improve accuracy and robustness in detecting AI-generated content.
Dataset: To train and evaluate their model, the authors constructed a large-scale generated video dataset (GVD). This dataset serves as a benchmark for assessing the model's performance.
Experimental Results: Extensive experiments demonstrate that the AIGVDet scheme has high generalization and robustness. This indicates that the system performs well across a variety of scenarios and is not limited to specific types of video manipulations.

Practical Implications:

The development of such a detection system is significant for mitigating the risks associated with the misuse of AI-generated videos. By efficiently identifying these videos, platforms and regulatory bodies can better manage and control the spread of false information. The authors mention that they plan to release both the code and the dataset, which could further encourage research and development in this domain.

PDF Markdown

AI-Generated Video Detection via Spatio-Temporal Anomaly Learning (2403.16638v1)

Summary

Key Components:

Practical Implications:

Related Papers