Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MotionEditor: Editing Video Motion via Content-Aware Diffusion (2311.18830v1)

Published 30 Nov 2023 in cs.CV

Abstract: Existing diffusion-based video editing models have made gorgeous advances for editing attributes of a source video over time but struggle to manipulate the motion information while preserving the original protagonist's appearance and background. To address this, we propose MotionEditor, a diffusion model for video motion editing. MotionEditor incorporates a novel content-aware motion adapter into ControlNet to capture temporal motion correspondence. While ControlNet enables direct generation based on skeleton poses, it encounters challenges when modifying the source motion in the inverted noise due to contradictory signals between the noise (source) and the condition (reference). Our adapter complements ControlNet by involving source content to transfer adapted control signals seamlessly. Further, we build up a two-branch architecture (a reconstruction branch and an editing branch) with a high-fidelity attention injection mechanism facilitating branch interaction. This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance. We also propose a skeleton alignment algorithm to address the discrepancies in pose size and position. Experiments demonstrate the promising motion editing ability of MotionEditor, both qualitatively and quantitatively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Text2live: Text-driven layered image and video editing. In ECCV, 2022.
  2. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In ICCV, 2023.
  3. Pix2video: Video editing using image diffusion. In ICCV, 2023.
  4. Stablevideo: Text-driven consistency-aware diffusion video editing. In ICCV, 2023.
  5. Diffusion models beat gans on image synthesis. In NeurIPS, 2021.
  6. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022a.
  7. Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics, 41(4):1–13, 2022b.
  8. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  9. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  10. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  11. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  12. Cascaded diffusion models for high fidelity image generation. JMLR, 2022.
  13. Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778, 2023.
  14. Layered neural atlases for consistent video editing. ACM Transactions on Graphics, 40(6):1–12, 2021.
  15. Imagic: Text-based real image editing with diffusion models. In CVPR, 2023.
  16. Text2video-zero: Text-to-image diffusion models are zero-shot video generators. In ICCV, 2023.
  17. Video-p2p: Video editing with cross-attention control. arXiv preprint arXiv:2303.04761, 2023.
  18. Liquid warping gan: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In ICCV, 2019.
  19. Follow your pose: Pose-guided text-to-video generation using pose-free videos. arXiv preprint arXiv:2304.01186, 2023.
  20. Sdedit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2021.
  21. Null-text inversion for editing real images using guided diffusion models. In CVPR, 2023.
  22. Dragondiffusion: Enabling drag-style manipulation on diffusion models. arXiv preprint arXiv:2307.02421, 2023.
  23. Improved denoising diffusion probabilistic models. In ICML, 2021.
  24. Sinfusion: Training diffusion models on a single image or video. arXiv preprint arXiv:2211.11743, 2022.
  25. Semantic image synthesis with spatially-adaptive normalization. In CVPR, pages 2337–2346, 2019.
  26. Styleclip: Text-driven manipulation of stylegan imagery. In ICCV, 2021.
  27. Fatezero: Fusing attentions for zero-shot text-based video editing. In ICCV, 2023.
  28. Learning transferable visual models from natural language supervision. In ICML, 2021.
  29. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  30. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023.
  31. Palette: Image-to-image diffusion models. In SIGGRAPH, 2022.
  32. Dragdiffusion: Harnessing diffusion models for interactive point-based image editing. arXiv preprint arXiv:2306.14435, 2023.
  33. First order motion model for image animation. NeurIPS, 2019.
  34. Motion representations for articulated animation. In CVPR, 2021.
  35. Denoising diffusion implicit models. In ICLR, 2021a.
  36. Score-based generative modeling through stochastic differential equations. In ICLR, 2021b.
  37. Plug-and-play diffusion features for text-driven image-to-image translation. In CVPR, 2023.
  38. Unitune: Text-driven image editing by fine tuning an image generation model on a single image. arXiv preprint arXiv:2210.09477, 2022.
  39. High-resolution image synthesis and semantic manipulation with conditional gans. In CVPR, 2018.
  40. Zero-shot video editing using off-the-shelf image diffusion models. arXiv preprint arXiv:2303.17599, 2023.
  41. Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In ICCV, 2023.
  42. Adding conditional control to text-to-image diffusion models. In ICCV, 2023a.
  43. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  44. Controlvideo: Training-free controllable text-to-video generation. arXiv preprint arXiv:2305.13077, 2023b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shuyuan Tu (5 papers)
  2. Qi Dai (58 papers)
  3. Zhi-Qi Cheng (61 papers)
  4. Han Hu (197 papers)
  5. Xintong Han (36 papers)
  6. Zuxuan Wu (144 papers)
  7. Yu-Gang Jiang (223 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.