Blind Video Temporal Consistency via Deep Video Prior (2010.11838v1)

Published 22 Oct 2020 in cs.CV

Abstract: Applying image processing algorithms independently to each video frame often leads to temporal inconsistency in the resulting video. To address this issue, we present a novel and general approach for blind video temporal consistency. Our method is only trained on a pair of original and processed videos directly instead of a large dataset. Unlike most previous methods that enforce temporal consistency with optical flow, we show that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior. Moreover, a carefully designed iteratively reweighted training strategy is proposed to address the challenging multimodal inconsistency problem. We demonstrate the effectiveness of our approach on 7 computer vision tasks on videos. Extensive quantitative and perceptual experiments show that our approach obtains superior performance than state-of-the-art methods on blind video temporal consistency. Our source codes are publicly available at github.com/ChenyangLEI/deep-video-prior.

Citations (104)

View on Semantic Scholar

Summary

The paper presents a CNN-based Deep Video Prior method that enforces temporal consistency without relying on handcrafted regularizations or optical flow.
It introduces an Iteratively Reweighted Training strategy to resolve multimodal discrepancies in tasks such as video colorization and dehazing.
Experimental results across seven computer vision tasks demonstrate superior performance in maintaining both temporal consistency and data fidelity.

Essay: Analyzing "Blind Video Temporal Consistency via Deep Video Prior"

Overview of the Paper

The paper "Blind Video Temporal Consistency via Deep Video Prior" authored by Chenyang Lei, Yazhou Xing, and Qifeng Chen, proposes an innovative approach towards achieving temporal consistency in video processing. The issue at hand is that the independent application of image processing algorithms to video frames typically results in significant temporal inconsistencies like flickering. Most existing methods seek to address this by enforcing temporal consistency using hand-crafted regularizations or optical flow techniques. This paper, however, introduces a method leveraging the Deep Video Prior (DVP), a convolutional neural network (CNN) framework that maintains temporal consistency without the need for such approaches.

Methodology

The authors introduce a CNN framework that, when trained on a video, achieves temporal consistency through the structure of the network itself. Contrary to traditional methods, this approach does not require a large labeled dataset as training is conducted on a single pair of original and processed videos. The innovation lies in using Deep Video Prior, an extension of the Deep Image Prior previously seen in single image processing. The network is trained to reconstruct details of the video, where flickering artifacts are not favored due to their structural incompatibility with the learned features.

An additional contribution of the paper is the novel Iteratively Reweighted Training (IRT) strategy designed to tackle multimodal inconsistency issues. This strategy dynamically selects samples during training to focus on consistent modes, mitigating artifacts created by fluctuations between multiple possible outputs in scenarios like video colorization.

Experimental Evaluation

The method was evaluated across seven computer vision tasks, including video colorization, dehazing, and enhancement. It demonstrated superior performance in achieving temporal consistency compared to the state-of-the-art methods, while also maintaining data fidelity. Notably, the proposed method outperformed baseline methods not only in avoiding flickering (temporal consistency) but also in retaining the quality inherent in processed images (data fidelity). Additionally, the framework is versatile, showing competitive results on diverse tasks without the need for optical flow computation or specialized training datasets.

Implications and Future Directions

The implications of this research are significant for the field of video processing. By removing reliance on large datasets and complex regularization techniques, the proposed method offers a streamlined approach that could enhance the efficiency and accessibility of video enhancements. Practically, this suggests potential applications in real-time video editing, augmented reality, and beyond, where temporal consistency is crucial for a seamless viewing experience.

Future developments may focus on reducing the computational cost associated with training on individual videos, to facilitate more widespread and practical applications. The foundational concept of Deep Video Prior could see extensions into related domains such as 3D image consistency, multi-view imaging, and possibly even temporally consistent synthesis in generative modeling scenarios.

Conclusion

This paper contributes a potent method to the ever-evolving toolkit for video processing. By leveraging neural network architecture to implicitly handle temporal consistency through Deep Video Prior, the authors have laid groundwork that challenges the traditional paradigms reliant on explicit temporal regularization techniques. The research paves the way for advancing AI methodologies in handling video-based content, promising strides in both academic and practical implementations of seamless and consistent video processing.

PDF Markdown

Related Papers

GitHub

GitHub - ChenyangLEI/deep-video-prior: [NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior (324 stars)

Tweets

https://twitter.com/PapersTrending/status/1325030854620553216

https://twitter.com/pythontrending/status/1323556163838005248

https://twitter.com/_testanic/status/1326419469401460738

https://twitter.com/PapersTrending/status/1324668366997590016