Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Coherent Online Video Style Transfer (1703.09211v2)

Published 27 Mar 2017 in cs.CV

Abstract: Training a feed-forward network for fast neural style transfer of images is proven to be successful. However, the naive extension to process video frame by frame is prone to producing flickering results. We propose the first end-to-end network for online video style transfer, which generates temporally coherent stylized video sequences in near real-time. Two key ideas include an efficient network by incorporating short-term coherence, and propagating short-term coherence to long-term, which ensures the consistency over larger period of time. Our network can incorporate different image stylization networks. We show that the proposed method clearly outperforms the per-frame baseline both qualitatively and quantitatively. Moreover, it can achieve visually comparable coherence to optimization-based video style transfer, but is three orders of magnitudes faster in runtime.

Citations (275)

Summary

  • The paper introduces the first end-to-end network for online video style transfer that integrates flow and mask sub-networks to ensure temporal consistency.
  • It leverages short-term feature flow estimation and propagates these cues for long-term coherence, effectively reducing flickering artifacts in videos.
  • The method achieves near real-time performance, operating three orders of magnitude faster than optimization-based approaches while maintaining high stylization quality.

Coherent Online Video Style Transfer

The presented paper introduces an innovative methodology for addressing temporal inconsistencies in neural style transfer for video sequences. Standard approaches, which typically extend image-based feed-forward style transfer networks to video by processing frames independently, are prone to flickering artifacts. This arises because minor variances in input frames can result in significant differences in the stylized outputs. The authors propose a coherent online video style transfer technique that leverages both short-term and long-term temporal coherence.

Key Contributions

  1. End-to-End Network Architecture: The authors develop the first end-to-end network specifically designed for online video style transfer. This network integrates temporal coherence to ensure smooth and stable stylized video outputs. The architecture employs a blend of flow and mask sub-networks to facilitate short-term and long-term consistency. These sub-networks are integrated into a pre-trained image stylization framework, allowing for adaptability across different style transfer models.
  2. Short-Term and Long-Term Coherence:
    • Short-Term Coherence: This is achieved by estimating dense feature correspondences, or feature flow, between consecutive frames using the flow sub-network. This motion estimation facilitates the alignment of stylization patterns across adjacent frames and minimizes flickering.
    • Long-Term Coherence: Propagation of short-term coherence across frames offers a practical approximation of long-term coherence. While the method primarily addresses short-term relationships, propagating these over time ensures consistency across longer video sequences.
  3. Efficient Execution: The proposed network achieves temporal consistency with an execution that is computationally efficient. It offers stylization outputs that are comparable with optimization-based methods, with the added advantage of being three orders of magnitude faster. This makes the method feasible for real-time applications such as live video processing.
  4. General Applicability: The network is versatile, able to integrate with various existing image stylization networks, including per-style-per-network and multiple-style-per-network architectures. It allows for the transfer of flow and mask estimations even to new styles, reinforcing the model's robustness and flexibility.

Experimental Insights

The empirical evaluation, conducted on both synthetic and real video datasets, demonstrates significant improvements in temporal coherence compared to frame-independent stylization. The quantitative analysis provided includes stability error measurements that confirm the efficacy of the method in maintaining temporal consistency while preserving stylization quality. Furthermore, the runtime performance assessment indicates the network’s viability for near real-time processing.

Implications and Future Directions

This work paves the way for more temporally coherent video stylization techniques that can be effectively applied across various domains, including entertainment, artistic software applications, and real-time video manipulation interfaces. The successful propagation of short-term coherence to achieve long-term consistency is a promising direction for future research.

Challenges such as managing accumulated propagation errors over prolonged periods or handling rapid motion scenarios remain open. Future research might focus on integrating advanced motion estimation techniques or exploring generalized temporal coherence mechanisms that adapt dynamically to varying video content.

In conclusion, this paper provides a comprehensive solution to a long-standing problem in the video style transfer domain, offering both theoretical advancements and practical applications in neural network-based video processing.