Enhancing Deformable Convolution based Video Frame Interpolation with Coarse-to-fine 3D CNN (2202.07731v2)
Abstract: This paper presents a new deformable convolution-based video frame interpolation (VFI) method, using a coarse to fine 3D CNN to enhance the multi-flow prediction. This model first extracts spatio-temporal features at multiple scales using a 3D CNN, and estimates multi-flows using these features in a coarse-to-fine manner. The estimated multi-flows are then used to warp the original input frames as well as context maps, and the warped results are fused by a synthesis network to produce the final output. This VFI approach has been fully evaluated against 12 state-of-the-art VFI methods on three commonly used test databases. The results evidently show the effectiveness of the proposed method, which offers superior interpolation performance over other state of the art algorithms, with PSNR gains up to 0.19dB.
- W. Bao, X. Zhang, L. Chen, L. Ding, and Z. Gao, “High-order model and dynamic filtering for frame rate up-conversion,” IEEE Trans. on Image Processing, vol. 27, no. 8, pp. 3813–3826, 2018.
- H. Jiang, D. Sun, V. Jampani, M.-H. Yang, E. Learned-Miller, and J. Kautz, “Super slomo: High quality estimation of multiple intermediate frames for video interpolation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9000–9008.
- J. Flynn, I. Neulander, J. Philbin, and N. Snavely, “Deepstereo: Learning to predict new views from the world’s imagery,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5515–5524.
- C.-Y. Wu, N. Singhal, and P. Krahenbuhl, “Video compression through image interpolation,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 416–431.
- Z. Liu, R. A. Yeh, X. Tang, Y. Liu, and A. Agarwala, “Video frame synthesis using deep voxel flow,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4463–4471.
- W. Bao, W.-S. Lai, C. Ma, X. Zhang, Z. Gao, and M.-H. Yang, “Depth-aware video frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3703–3712.
- J. Park, K. Ko, C. Lee, and C.-S. Kim, “Bmbc: Bilateral motion estimation with bilateral cost volume for video interpolation,” in European Conference on Computer Vision, 2020.
- H. Sim, J. Oh, and M. Kim, “Xvfi: extreme video frame interpolation,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2021.
- H. Lee, T. Kim, T.-y. Chung, D. Pak, Y. Ban, and S. Lee, “Adacof: Adaptive collaboration of flows for video frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5316–5325.
- S. Gui, C. Wang, Q. Chen, and D. Tao, “Featureflow: Robust video interpolation via structure-to-texture generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14 004–14 013.
- S. Niklaus, L. Mai, and F. Liu, “Video frame interpolation via adaptive separable convolution,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 261–270.
- T. Ding, L. Liang, Z. Zhu, and I. Zharkov, “Cdfi: Compression-driven network design for frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8001–8011.
- T. Kalluri, D. Pathak, M. Chandraker, and D. Tran, “Flavr: Flow-agnostic video representations for fast frame interpolation,” arXiv preprint arXiv:2012.08512, 2020.
- J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 764–773.
- X. Cheng and Z. Chen, “Multiple video frame interpolation via enhanced deformable separable convolution,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 2021.
- D. Danier, F. Zhang, and D. Bull, “Spatio-temporal multi-flow network for video frame interpolation,” arXiv preprint arXiv:2111.15483, 2021.
- K. Yang, D. Liu, Z. Chen, F. Wu, and W. Li, “Spatiotemporal generative adversarial network-based dynamic texture synthesis for surveillance video coding,” IEEE Trans. on Circuits and Systems for Video Technology, 2021.
- D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8934–8943.
- S. Niklaus and F. Liu, “Context-aware synthesis for video frame interpolation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1701–1710.
- D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, and M. Paluri, “A closer look at spatiotemporal convolutions for action recognition,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- D. Fourure, R. Emonet, E. Fromont, D. Muselet, A. Tremeau, and C. Wolf, “Residual conv-deconv grid network for semantic segmentation,” arXiv preprint arXiv:1707.07958, 2017.
- P. Bojanowski, A. Joulin, D. Lopez-Paz, and A. Szlam, “Optimizing the latent space of generative networks,” arXiv preprint arXiv:1707.05776, 2017.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” arXiv preprint arXiv:1607.08022, 2016.
- T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision, vol. 127, no. 8, pp. 1106–1125, 2019.
- D. Ma, F. Zhang, and D. Bull, “Bvi-dvc: A training database for deep video compression,” IEEE Trans. on Multimedia, pp. 1–1, 2021.
- M. Choi, H. Kim, B. Han, N. Xu, and K. M. Lee, “Channel attention is all you need for video frame interpolation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 10 663–10 671.
- K. Soomro, A. R. Zamir, and M. Shah, “Ucf101: A dataset of 101 human actions classes from videos in the wild,” arXiv preprint arXiv:1212.0402, 2012.
- F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, “A benchmark dataset and evaluation methodology for video object segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 724–732.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
- Duolikun Danier (20 papers)
- Fan Zhang (686 papers)
- David Bull (67 papers)