Three-Stage Cascade Framework for Blurry Video Frame Interpolation (2310.05383v1)
Abstract: Blurry video frame interpolation (BVFI) aims to generate high-frame-rate clear videos from low-frame-rate blurry videos, is a challenging but important topic in the computer vision community. Blurry videos not only provide spatial and temporal information like clear videos, but also contain additional motion information hidden in each blurry frame. However, existing BVFI methods usually fail to fully leverage all valuable information, which ultimately hinders their performance. In this paper, we propose a simple end-to-end three-stage framework to fully explore useful information from blurry videos. The frame interpolation stage designs a temporal deformable network to directly sample useful information from blurry inputs and synthesize an intermediate frame at an arbitrary time interval. The temporal feature fusion stage explores the long-term temporal information for each target frame through a bi-directional recurrent deformable alignment network. And the deblurring stage applies a transformer-empowered Taylor approximation network to recursively recover the high-frequency details. The proposed three-stage framework has clear task assignment for each module and offers good expandability, the effectiveness of which are demonstrated by various experimental results. We evaluate our model on four benchmarks, including the Adobe240 dataset, GoPro dataset, YouTube240 dataset and Sony dataset. Quantitative and qualitative results indicate that our model outperforms existing SOTA methods. Besides, experiments on real-world blurry videos also indicate the good generalization ability of our model.
- M. Jin, Z. Hu, and P. Favaro, “Learning to extract flawless slow motion from blurry videos,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8112–8121.
- Y. Zhang, C. Wang, and D. Tao, “Video frame interpolation without temporal priors,” Advances in Neural Information Processing Systems, vol. 33, pp. 13 308–13 318, 2020.
- W. Shen, W. Bao, G. Zhai, L. Chen, X. Min, and Z. Gao, “Blurry video frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5114–5123.
- A. Gupta, A. Aich, and A. K. Roy-Chowdhury, “Alanet: Adaptive latent attention network for joint video deblurring and interpolation,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 256–264.
- W. Shen, W. Bao, G. Zhai, L. Chen, X. Min, and Z. Gao, “Video frame interpolation and enhancement via pyramid recurrent framework,” IEEE Transactions on Image Processing, vol. 30, pp. 277–292, 2020.
- J. Oh and M. Kim, “Demfi: deep joint deblurring and multi-frame interpolation with flow-guided attentive correlation and recursive boosting,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VII. Springer, 2022, pp. 198–215.
- Z. Zhong, M. Cao, X. Ji, Y. Zheng, and I. Sato, “Blur interpolation transformer for real-world motion from blur,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5713–5723.
- W. Bao, W.-S. Lai, C. Ma, X. Zhang, Z. Gao, and M.-H. Yang, “Depth-aware video frame interpolation,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3698–3707.
- S. Niklaus and F. Liu, “Softmax splatting for video frame interpolation,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5436–5445.
- J. Park, C. Lee, and C.-S. Kim, “Asymmetric bilateral motion estimation for video frame interpolation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 14 539–14 548.
- L. Lu, R. Wu, H. Lin, J. Lu, and J. Jia, “Video frame interpolation with transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 3532–3542.
- M. Usman, X. He, K.-M. Lam, M. Xu, S. M. M. Bokhari, and J. Chen, “Frame interpolation for cloud-based mobile video streaming,” IEEE Transactions on Multimedia, vol. 18, no. 5, pp. 831–839, 2016.
- X. Wang, K. C. Chan, K. Yu, C. Dong, and C. C. Loy, “Edvr: Video restoration with enhanced deformable convolutional networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, 2019, pp. 1954–1963.
- J. Pan, H. Bai, and J. Tang, “Cascaded deep video deblurring using temporal sharpness prior,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3043–3051.
- J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in 2017 IEEE/CVF International Conference on Computer Vision (ICCV), 2017, pp. 764–773.
- X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9300–9308.
- Y. Tian, Y. Zhang, Y. Fu, and C. Xu, “Tdan: Temporally-deformable alignment network for video super-resolution,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3357–3366.
- S. Gui, C. Wang, Q. Chen, and D. Tao, “Featureflow: Robust video interpolation via structure-to-texture generation,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 14 001–14 010.
- X. Cheng and Z. Chen, “Multiple video frame interpolation via enhanced deformable separable convolution,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021.
- H. Lee, T. Kim, T.-y. Chung, D. Pak, Y. Ban, and S. Lee, “Adacof: Adaptive collaboration of flows for video frame interpolation,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5315–5324.
- P. Wieschollek, M. Hirsch, B. Scholkopf, and H. Lensch, “Learning blind motion deblurring,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 231–240.
- T. Hyun Kim, K. Mu Lee, B. Scholkopf, and M. Hirsch, “Online video deblurring via dynamic temporal blending network,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4038–4047.
- K. C. Chan, S. Zhou, X. Xu, and C. C. Loy, “Basicvsr++: Improving video super-resolution with enhanced propagation and alignment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 5972–5981.
- J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 1833–1844.
- S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5728–5739.
- J. Liang, J. Cao, Y. Fan, K. Zhang, R. Ranjan, Y. Li, R. Timofte, and L. Van Gool, “Vrt: A video restoration transformer,” arXiv preprint arXiv:2201.12288, 2022.
- H. Jiang, D. Sun, V. Jampani, M.-H. Yang, E. Learned-Miller, and J. Kautz, “Super slomo: High quality estimation of multiple intermediate frames for video interpolation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 9000–9008.
- L. Kong, B. Jiang, D. Luo, W. Chu, X. Huang, Y. Tai, C. Wang, and J. Yang, “Ifrnet: Intermediate feature refine network for efficient frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- S. Niklaus, L. Mai, and F. Liu, “Video frame interpolation via adaptive separable convolution,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 261–270.
- ——, “Video frame interpolation via adaptive convolution,” in 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2270–2279.
- P. Lei, F. Fang, T. Zeng, and G. Zhang, “Flow guidance deformable compensation network for video frame interpolation,” IEEE Transactions on Multimedia, 2023.
- L. Bar, B. Berkels, M. Rumpf, and G. Sapiro, “A variational framework for simultaneous motion estimation and restoration of motion-blurred video,” in 2007 IEEE 11th International Conference on Computer Vision. IEEE, 2007, pp. 1–8.
- T. Hyun Kim and K. Mu Lee, “Generalized video deblurring for dynamic scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5426–5434.
- J. Wulff and M. J. Black, “Modeling blurred video with layers,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13. Springer, 2014, pp. 236–252.
- H. Zhang, H. Xie, and H. Yao, “Spatio-temporal deformable attention network for video deblurring,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI. Springer, 2022, pp. 581–596.
- B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image super-resolution,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 1132–1140.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 012–10 022.
- X. Fu, Z. Xiao, G. Yang, A. Liu, Z. Xiong et al., “Unfolding taylor’s approximations for image restoration,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 997–19 009, 2021.
- S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang, “Deep video deblurring for hand-held cameras,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1279–1288.
- S. Nah, T. Hyun Kim, and K. Mu Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3883–3891.
- D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Computer Science, 2014.
- I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016.
- P. Charbonnier, L. Blanc-Feraud, G. Aubert, and M. Barlaud, “Two deterministic half-quadratic regularization algorithms for computed imaging,” in Proceedings of 1st international conference on image processing, vol. 2. IEEE, 1994, pp. 168–172.
- B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.