AI-Generated Video Detection via Spatio-Temporal Anomaly Learning (2403.16638v1)
Abstract: The advancement of generation models has led to the emergence of highly realistic AI-generated videos. Malicious users can easily create non-existent videos to spread false information. This letter proposes an effective AI-generated video detection (AIGVDet) scheme by capturing the forensic traces with a two-branch spatio-temporal convolutional neural network (CNN). Specifically, two ResNet sub-detectors are learned separately for identifying the anomalies in spatical and optical flow domains, respectively. Results of such sub-detectors are fused to further enhance the discrimination ability. A large-scale generated video dataset (GVD) is constructed as a benchmark for model training and evaluation. Extensive experimental results verify the high generalization and robustness of our AIGVDet scheme. Code and dataset will be available at https://github.com/multimediaFor/AIGVDet.
- openai sora, https://openai.com/sora.
- The Washington Post, https://www.washingtonpost.com/technology/2023/10/30/biden-artificial-intelligence-executive-order/.
- S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “Cnn-generated images are surprisingly easy to spot… for now,” in IEEE International Conference on Computer Vision and Pattern Recognition, 2020, pp. 8695–8704.
- D. Gragnaniello, D. Cozzolino, F. Marra, G. Poggi, and L. Verdoliva, “Are gan generated images easy to detect? a critical analysis of the state-of-the-art,” in IEEE International Conference on Multimedia and Expo, 2021, pp. 1–6.
- R. Corvi, D. Cozzolino, G. Zingarini, G. Poggi, K. Nagano, and L. Verdoliva, “On the detection of synthetic images generated by diffusion models,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2023, pp. 1–5.
- Z. Wang, J. Bao, W. Zhou, W. Wang, H. Hu, H. Chen, and H. Li, “Dire for diffusion-generated image detection,” arXiv: 2303.09295.
- R. Caldelli, L. Galteri, I. Amerini, and A. Del Bimbo, “Optical flow based cnn for detection of unlearnt deepfake manipulations,” Elsevier Pattern Recognition Letters, vol. 146, pp. 31–37, 2021.
- C.-Z. Yang, J. Ma, S. Wang, and A. W.-C. Liew, “Preventing deepfake attacks on speaker authentication by dynamic lip movement analysis,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 1841–1854, 2020.
- Z. Gu, Y. Chen, T. Yao, S. Ding, J. Li, and L. Ma, “Delving into the local: Dynamic inconsistency learning for deepfake video detection,” in AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 744–752.
- Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Springer European Conference on Computer Vision, 2020, pp. 402–419.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE International Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- Discord, https://discord.com/.
- Moonvalley, https://moonvalley.ai/.
- H. Chen, M. Xia, Y. He, Y. Zhang, X. Cun, S. Yang, J. Xing, Y. Liu, Q. Chen, X. Wang et al., “Videocrafter1: Open diffusion models for high-quality video generation,” arXiv: 2310.19512, 2023.
- Pika, https://www.pika.art/.
- NeverEnds, https://neverends.life.
- R. Girdhar, M. Singh, A. Brown et al., “Emu video: Factorizing text-to-video generation by explicit image conditioning,” arXiv: 2311.10709, 2023.
- D. Kondratyuk, L. Yu, X. Gu, J. Lezama, J. Huang, R. Hornung, H. Adam, H. Akbari, Y. Alon, V. Birodkar et al., “Videopoet: A large language model for zero-shot video generation,” arXiv: 2312.14125, 2023.
- L. Yang, Y. Fan, and N. Xu, “Video instance segmentation,” in IEEE International Conference on Computer Vision, 2019, pp. 5188–5197.
- L. Huang, X. Zhao, and K. Huang, “Got-10k: A large high-diversity benchmark for generic object tracking in the wild,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1562–1577, 2019.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE Conference on computer Vision and Pattern Recognition, 2009, pp. 248–255.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.