Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption (2207.10402v2)
Abstract: Despite encouraging progress in deepfake detection, generalization to unseen forgery types remains a significant challenge due to the limited forgery clues explored during training. In contrast, we notice a common phenomenon in deepfake: fake video creation inevitably disrupts the statistical regularity in original videos. Inspired by this observation, we propose to boost the generalization of deepfake detection by distinguishing the "regularity disruption" that does not appear in real videos. Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator and create a wide range of pseudo-fake videos for training. Such practice allows us to achieve deepfake detection without using fake videos and improves the generalization ability in a simple and efficient manner. To jointly capture the spatial and temporal disruptions, we propose a Spatio-Temporal Enhancement block to learn the regularity disruption across space and time on our self-created videos. Through comprehensive experiments, our method exhibits excellent performance on several datasets.
- Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, “Attgan: Facial attribute editing by only changing what you want,” IEEE Transactions on Image Processing, vol. 28, no. 11, pp. 5464–5478, 2019.
- H. Kim, P. Garrido, A. Tewari, W. Xu, J. Thies, M. Niessner, P. Pérez, C. Richardt, M. Zollhöfer, and C. Theobalt, “Deep video portraits,” ACM Transactions on Graphics (TOG), vol. 37, no. 4, pp. 1–14, 2018.
- S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-Shlizerman, “Synthesizing obama: learning lip sync from audio,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, pp. 1–13, 2017.
- Y. Xu, X. Xu, J. Jiao, K. Li, C. Xu, and S. He, “Multi-view face synthesis via progressive face flow,” IEEE Transactions on Image Processing, vol. 30, pp. 6024–6035, 2021.
- M. Cao, H. Huang, H. Wang, X. Wang, L. Shen, S. Wang, L. Bao, Z. Li, and J. Luo, “Unifacegan: A unified framework for temporally consistent facial video editing,” IEEE Transactions on Image Processing, vol. 30, pp. 6107–6116, 2021.
- Z. Shao, H. Zhu, J. Tang, X. Lu, and L. Ma, “Explicit facial expression transfer via fine-grained representations,” IEEE Transactions on Image Processing, vol. 30, pp. 4610–4621, 2021.
- N. Yang, Z. Zheng, M. Zhou, X. Guo, L. Qi, and T. Wang, “A domain-guided noise-optimization-based inversion method for facial image manipulation,” IEEE Transactions on Image Processing, vol. 30, pp. 6198–6211, 2021.
- B. Bayar and M. C. Stamm, “A deep learning approach to universal image manipulation detection using a new convolutional layer,” in Proceedings of the 4th ACM workshop on information hiding and multimedia security, 2016, pp. 5–10.
- K. Sun, T. Yao, S. Chen, S. Ding, R. Ji et al., “Dual contrastive learning for general face forgery detection,” arXiv preprint arXiv:2112.13522, 2021.
- N. Rahmouni, V. Nozick, J. Yamagishi, and I. Echizen, “Distinguishing computer graphics from natural images using convolution neural networks,” in 2017 IEEE Workshop on Information Forensics and Security (WIFS). IEEE, 2017, pp. 1–6.
- Y. Luo, Y. Zhang, J. Yan, and W. Liu, “Generalizing face forgery detection with high-frequency features,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16 317–16 326.
- L. Chai, D. Bau, S.-N. Lim, and P. Isola, “What makes fake images detectable? understanding properties that generalize,” in European Conference on Computer Vision. Springer, 2020, pp. 103–120.
- H. Zhao, W. Zhou, D. Chen, T. Wei, W. Zhang, and N. Yu, “Multi-attentional deepfake detection,” arXiv preprint arXiv:2103.02406, 2021.
- S. Das, S. Seferbekov, A. Datta, M. Islam, M. Amin et al., “Towards solving the deepfake problem: An analysis on improving deepfake detection using dynamic face augmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3776–3785.
- A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic, “Lips don’t lie: A generalisable and robust approach to face forgery detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5039–5049.
- Z. Gu, Y. Chen, T. Yao, S. Ding, J. Li, F. Huang, and L. Ma, “Spatiotemporal inconsistency learning for deepfake video detection,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3473–3481.
- Z. Sun, Y. Han, Z. Hua, N. Ruan, and W. Jia, “Improving the efficiency and robustness of deepfakes detection through precise geometric features,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3609–3618.
- A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “Faceforensics++: Learning to detect manipulated facial images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1–11.
- L. Jiang, R. Li, W. Wu, C. Qian, and C. C. Loy, “Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2889–2898.
- C. Wang and W. Deng, “Representative forgery mining for fake face detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 923–14 932.
- Z. Wang, Y. Guo, and W. Zuo, “Deepfake forensics via an adversarial game,” IEEE Transactions on Image Processing, 2022.
- L. Chen, Y. Zhang, Y. Song, L. Liu, and J. Wang, “Self-supervised learning of adversarial example: Towards good generalizations for deepfake detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 710–18 719.
- Y. Zheng, J. Bao, D. Chen, M. Zeng, and F. Wen, “Exploring temporal coherence for more general video face forgery detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 044–15 054.
- I. Masi, A. Killekar, R. M. Mascarenhas, S. P. Gurudatt, and W. AbdAlmageed, “Two-branch recurrent network for isolating deepfakes in videos,” in European Conference on Computer Vision. Springer, 2020, pp. 667–684.
- J. Li, H. Xie, J. Li, Z. Wang, and Y. Zhang, “Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6458–6467.
- Y. Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, “Thinking in frequency: Face forgery detection by mining frequency-aware clues,” in European Conference on Computer Vision. Springer, 2020, pp. 86–103.
- Y. Li and S. Lyu, “Exposing deepfake videos by detecting face warping artifacts,” arXiv preprint arXiv:1811.00656, 2018.
- L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, “Face x-ray for more general face forgery detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5001–5010.
- H. Farid, “Image forgery detection,” IEEE Signal processing magazine, vol. 26, no. 2, pp. 16–25, 2009.
- C.-W. Chen and C.-S. Fuh, “Lens shading correction for dirt detection,” in Pattern Recognition, Machine Intelligence and Biometrics. Springer, 2011, pp. 171–195.
- R. C. Gonzalez, R. E. Woods et al., “Digital image processing. 2002,” Google Scholar Google Scholar Digital Library Digital Library, 2007.
- G. K. Wallace, “The jpeg still picture compression standard,” IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.
- L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
- B. Zi, M. Chang, J. Chen, X. Ma, and Y.-G. Jiang, “Wilddeepfake: A challenging real-world dataset for deepfake detection,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2382–2390.
- L. Verdoliva, “Media forensics and deepfakes: an overview,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 5, pp. 910–932, 2020.
- L. Zhuo, S. Tan, B. Li, and J. Huang, “Self-adversarial training incorporating forgery attention for image forgery localization,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 819–834, 2022.
- J. H. Bappy, C. Simons, L. Nataraj, B. S. Manjunath, and A. K. Roy-Chowdhury, “Hybrid lstm and encoder–decoder architecture for detection of image forgeries,” IEEE Transactions on Image Processing, vol. 28, no. 7, pp. 3286–3300, 2019.
- M. Huh, A. Liu, A. Owens, and A. A. Efros, “Fighting fake news: Image splice detection via learned self-consistency,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 101–117.
- Y. Liu, X. Zhu, X. Zhao, and Y. Cao, “Adversarial learning for constrained image splicing detection and localization based on atrous convolution,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 10, pp. 2551–2566, 2019.
- Y. Liu, C. Xia, X. Zhu, and S. Xu, “Two-stage copy-move forgery detection with self deep matching and proposal superglue,” IEEE Transactions on Image Processing, vol. 31, pp. 541–555, 2021.
- A. Ferreira, S. C. Felipussi, C. Alfaro, P. Fonseca, J. E. Vargas-Muñoz, J. A. dos Santos, and A. Rocha, “Behavior knowledge space-based fusion for copy–move forgery detection,” IEEE Transactions on Image Processing, vol. 25, no. 10, pp. 4729–4742, 2016.
- C.-M. Pun, X.-C. Yuan, and X.-L. Bi, “Image forgery detection using adaptive oversegmentation and feature point matching,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 8, pp. 1705–1716, 2015.
- M. Aloraini, M. Sharifzadeh, and D. Schonfeld, “Sequential and patch analyses for object removal video forgery detection and localization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 3, pp. 917–930, 2021.
- M. C. Stamm, W. S. Lin, and K. R. Liu, “Temporal forensics and anti-forensics for motion compensated video,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 4, pp. 1315–1329, 2012.
- X. Yang, Y. Li, and S. Lyu, “Exposing deep fakes using inconsistent head poses,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 8261–8265.
- S. Agarwal, H. Farid, Y. Gu, M. He, K. Nagano, and H. Li, “Protecting world leaders against deep fakes.” in CVPR workshops, vol. 1, 2019.
- Y. Li, M.-C. Chang, and S. Lyu, “In ictu oculi: Exposing ai created fake videos by detecting eye blinking,” in 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 2018, pp. 1–7.
- H. Dang, F. Liu, J. Stehouwer, X. Liu, and A. K. Jain, “On the detection of digital face manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5781–5790.
- B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C. Canton Ferrer, “The deepfake detection challenge dataset,” arXiv e-prints, pp. arXiv–2006, 2020.
- H. Liu, X. Li, W. Zhou, Y. Chen, Y. He, H. Xue, W. Zhang, and N. Yu, “Spatial-phase shallow learning: rethinking face forgery detection in frequency domain,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 772–781.
- P. J. Burt and E. H. Adelson, “The laplacian pyramid as a compact image code,” in Readings in computer vision. Elsevier, 1987, pp. 671–679.
- H. Qi, Q. Guo, F. Juefei-Xu, X. Xie, L. Ma, W. Feng, Y. Liu, and J. Zhao, “Deeprhythm: exposing deepfakes with attentional visual heartbeat rhythms,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 4318–4327.
- I. Ganiyusufoglu, L. M. Ngô, N. Savov, S. Karaoglu, and T. Gevers, “Spatio-temporal features for generalized detection of deepfake videos,” arXiv preprint arXiv:2010.11844, 2020.
- C. Lu, B. Liu, W. Zhou, Q. Chu, and N. Yu, “Deepfake video detection using 3d-attentional inception convolutional neural network,” in 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2021, pp. 3572–3576.
- X. Li, Y. Lang, Y. Chen, X. Mao, Y. He, S. Wang, H. Xue, and Q. Lu, “Sharp multiple instance learning for deepfake video detection,” in Proceedings of the 28th ACM international conference on multimedia, 2020, pp. 1864–1872.
- Z. Yu, W. Peng, X. Li, X. Hong, and G. Zhao, “Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 151–160.
- T. Zhao, X. Xu, M. Xu, H. Ding, Y. Xiong, and W. Xia, “Learning self-consistency for deepfake detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 023–15 033.
- K. Shiohara and T. Yamasaki, “Detecting deepfakes with self-blended images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 720–18 729.
- Q. Gu, S. Chen, T. Yao, Y. Chen, S. Ding, and R. Yi, “Exploiting fine-grained face forgery clues via progressive enhancement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 735–743.
- N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE transactions on Computers, vol. 100, no. 1, pp. 90–93, 1974.
- B. Jiang, M. Wang, W. Gan, W. Wu, and J. Yan, “Stm: Spatiotemporal and motion encoding for action recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2000–2009.
- Y. Li, B. Ji, X. Shi, J. Zhang, B. Kang, and L. Wang, “Tea: Temporal excitation and aggregation for action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 909–918.
- Z. Liu, D. Luo, Y. Wang, L. Wang, Y. Tai, C. Wang, J. Li, F. Huang, and T. Lu, “Teinet: Towards an efficient architecture for video recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 669–11 676.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- “Deepfakes github,” https://github.com/deepfakes/faceswap, accessed: 2022-02-28.
- “Deepfakes github,” https://github.com/MarekKowalski/FaceSwap, accessed: 2022-02-28.
- J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner, “Face2face: Real-time face capture and reenactment of rgb videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2387–2395.
- J. Thies, M. Zollhöfer, and M. Nießner, “Deferred neural rendering: Image synthesis using neural textures,” ACM Transactions on Graphics (TOG), vol. 38, no. 4, pp. 1–12, 2019.
- B. Dolhansky, R. Howes, B. Pflaum, N. Baram, and C. C. Ferrer, “The deepfake detection challenge (dfdc) preview dataset,” arXiv preprint arXiv:1910.08854, 2019.
- Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-df: A large-scale challenging dataset for deepfake forensics,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3207–3216.
- “Deepfakedetection,” https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html, accessed: 2022-02-28.
- L. Li, J. Bao, H. Yang, D. Chen, and F. Wen, “Advancing high fidelity identity swapping for forgery detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5074–5083.
- K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016.
- D. E. King, “Dlib-ml: A machine learning toolkit,” The Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.
- F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.
- K. Sun, H. Liu, Q. Ye, J. Liu, Y. Gao, L. Shao, and R. Ji, “Domain general face forgery detection by learning to weight,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 3, 2021, pp. 2638–2646.
- S. Chen, T. Yao, Y. Chen, S. Ding, J. Li, and R. Ji, “Local relation learning for face forgery detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, 2021, pp. 1081–1088.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 012–10 022.
- J. Lin, C. Gan, and S. Han, “Tsm: Temporal shift module for efficient video understanding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7083–7093.
- C. Feichtenhofer, H. Fan, J. Malik, and K. He, “Slowfast networks for video recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6202–6211.
- M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105–6114.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626.
- Jiazhi Guan (8 papers)
- Hang Zhou (166 papers)
- Mingming Gong (135 papers)
- Errui Ding (156 papers)
- Jingdong Wang (236 papers)
- Youjian Zhao (14 papers)