AI-Generated Video Detection via Spatio-Temporal Anomaly Learning (2403.16638v1)
Abstract: The advancement of generation models has led to the emergence of highly realistic AI-generated videos. Malicious users can easily create non-existent videos to spread false information. This letter proposes an effective AI-generated video detection (AIGVDet) scheme by capturing the forensic traces with a two-branch spatio-temporal convolutional neural network (CNN). Specifically, two ResNet sub-detectors are learned separately for identifying the anomalies in spatical and optical flow domains, respectively. Results of such sub-detectors are fused to further enhance the discrimination ability. A large-scale generated video dataset (GVD) is constructed as a benchmark for model training and evaluation. Extensive experimental results verify the high generalization and robustness of our AIGVDet scheme. Code and dataset will be available at https://github.com/multimediaFor/AIGVDet.
- openai sora, https://openai.com/sora.
- The Washington Post, https://www.washingtonpost.com/technology/2023/10/30/biden-artificial-intelligence-executive-order/.
- S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, “Cnn-generated images are surprisingly easy to spot… for now,” in IEEE International Conference on Computer Vision and Pattern Recognition, 2020, pp. 8695–8704.
- D. Gragnaniello, D. Cozzolino, F. Marra, G. Poggi, and L. Verdoliva, “Are gan generated images easy to detect? a critical analysis of the state-of-the-art,” in IEEE International Conference on Multimedia and Expo, 2021, pp. 1–6.
- R. Corvi, D. Cozzolino, G. Zingarini, G. Poggi, K. Nagano, and L. Verdoliva, “On the detection of synthetic images generated by diffusion models,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2023, pp. 1–5.
- Z. Wang, J. Bao, W. Zhou, W. Wang, H. Hu, H. Chen, and H. Li, “Dire for diffusion-generated image detection,” arXiv: 2303.09295.
- R. Caldelli, L. Galteri, I. Amerini, and A. Del Bimbo, “Optical flow based cnn for detection of unlearnt deepfake manipulations,” Elsevier Pattern Recognition Letters, vol. 146, pp. 31–37, 2021.
- C.-Z. Yang, J. Ma, S. Wang, and A. W.-C. Liew, “Preventing deepfake attacks on speaker authentication by dynamic lip movement analysis,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 1841–1854, 2020.
- Z. Gu, Y. Chen, T. Yao, S. Ding, J. Li, and L. Ma, “Delving into the local: Dynamic inconsistency learning for deepfake video detection,” in AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 744–752.
- Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Springer European Conference on Computer Vision, 2020, pp. 402–419.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE International Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- Discord, https://discord.com/.
- Moonvalley, https://moonvalley.ai/.
- H. Chen, M. Xia, Y. He, Y. Zhang, X. Cun, S. Yang, J. Xing, Y. Liu, Q. Chen, X. Wang et al., “Videocrafter1: Open diffusion models for high-quality video generation,” arXiv: 2310.19512, 2023.
- Pika, https://www.pika.art/.
- NeverEnds, https://neverends.life.
- R. Girdhar, M. Singh, A. Brown et al., “Emu video: Factorizing text-to-video generation by explicit image conditioning,” arXiv: 2311.10709, 2023.
- D. Kondratyuk, L. Yu, X. Gu, J. Lezama, J. Huang, R. Hornung, H. Adam, H. Akbari, Y. Alon, V. Birodkar et al., “Videopoet: A large language model for zero-shot video generation,” arXiv: 2312.14125, 2023.
- L. Yang, Y. Fan, and N. Xu, “Video instance segmentation,” in IEEE International Conference on Computer Vision, 2019, pp. 5188–5197.
- L. Huang, X. Zhao, and K. Huang, “Got-10k: A large high-diversity benchmark for generic object tracking in the wild,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1562–1577, 2019.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE Conference on computer Vision and Pattern Recognition, 2009, pp. 248–255.