Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deformable One-shot Face Stylization via DINO Semantic Guidance (2403.00459v2)

Published 1 Mar 2024 in cs.CV

Abstract: This paper addresses the complex issue of one-shot face stylization, focusing on the simultaneous consideration of appearance and structure, where previous methods have fallen short. We explore deformation-aware face stylization that diverges from traditional single-image style reference, opting for a real-style image pair instead. The cornerstone of our method is the utilization of a self-supervised vision transformer, specifically DINO-ViT, to establish a robust and consistent facial structure representation across both real and style domains. Our stylization process begins by adapting the StyleGAN generator to be deformation-aware through the integration of spatial transformers (STN). We then introduce two innovative constraints for generator fine-tuning under the guidance of DINO semantics: i) a directional deformation loss that regulates directional vectors in DINO space, and ii) a relative structural consistency constraint based on DINO token self-similarities, ensuring diverse generation. Additionally, style-mixing is employed to align the color generation with the reference, minimizing inconsistent correspondences. This framework delivers enhanced deformability for general one-shot face stylization, achieving notable efficiency with a fine-tuning duration of approximately 10 minutes. Extensive qualitative and quantitative comparisons demonstrate our superiority over state-of-the-art one-shot face stylization methods. Code is available at https://github.com/zichongc/DoesFS

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. 3davatargan: Bridging domains for personalized editable avatars. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, pages 4552–4562, 2023.
  2. Image2stylegan: How to embed images into the stylegan latent space?, 2019.
  3. Deep vit features as dense visual descriptors. ECCVW What is Motion For?, 2022.
  4. Carigans: Unpaired photo-to-caricature translation, 2018.
  5. Emerging properties in self-supervised vision transformers. In Proc. of Int. Conf. on Computer Vision, 2021.
  6. Jojogan: One shot face stylization. In Proc. of Euro. Conf. on Computer Vision, 2022.
  7. Arcface: Additive angular margin loss for deep face recognition. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, pages 4690–4699, 2019.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. of Int. Conf. on Learning Representations, 2021.
  9. Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Trans. on Graphics (Proc. of SIGGRAPH), 41(4), jul 2022.
  10. Image style transfer using convolutional neural networks. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, pages 2414–2423, Las Vegas, NV, USA, June 2016. IEEE.
  11. Autotoon: Automatic geometric warping for face cartoon generation. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 349–358, 2020.
  12. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014.
  13. Arbitrary style transfer in real-time with adaptive instance normalization. In Proc. of Int. Conf. on Computer Vision, 2017.
  14. Spatial transformer networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
  15. Stylecarigan: Caricature generation via stylegan feature map modulation. ACM Trans. on Graphics (Proc. of SIGGRAPH), 40(4), 2021.
  16. Training generative adversarial networks with limited data. In Advances in Neural Information Processing Systems, 2020.
  17. A style-based generator architecture for generative adversarial networks, 2019.
  18. Analyzing and improving the image quality of StyleGAN. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, 2020.
  19. Deformable style transfer. In Proc. of Euro. Conf. on Computer Vision, 2020.
  20. Style transfer by relaxed optimal transport and self-similarity. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, pages 10043–10052, Los Alamitos, CA, USA, jun 2019. IEEE Computer Society.
  21. One-shot adaptation of gan in just one clip. IEEE Trans. Pattern Analysis & Machine Intelligence, 45(10):12179–12191, 2023.
  22. Dct-net: Domain-calibrated translation for portrait stylization. ACM Trans. on Graphics, 41(4):1–9, 2022.
  23. Few-shot cross-domain image generation via inference-time latent-code learning. In Proc. of Int. Conf. on Learning Representations, 2023.
  24. Few-shot image generation via cross-domain correspondence. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, 2021.
  25. Styleclip: Text-driven manipulation of stylegan imagery. In Proc. of Int. Conf. on Computer Vision, pages 2085–2094, October 2021.
  26. Justin N. M. Pinkney and Doron Adler. Resolution dependent gan interpolation for controllable image synthesis between domains, 2020.
  27. Learning transferable visual models from natural language supervision, 2021.
  28. Warpgan: Automatic caricature generation. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, 2019.
  29. Very deep convolutional networks for large-scale image recognition. In Proc. of Int. Conf. on Learning Representations, 2015.
  30. Etnet: Error transition network for arbitrary style transfer. In Advances in Neural Information Processing Systems, pages 668–677, 2019.
  31. Agilegan: Stylizing portraits by inversion-consistent transfer learning. ACM Transactions on Graphics (Proc. SIGGRAPH), jul 2021.
  32. Designing an encoder for stylegan image manipulation. arXiv preprint arXiv:2102.02766, 2021.
  33. Splicing vit features for semantic appearance transfer. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, pages 10748–10757, 2022.
  34. Efanet: Exchangeable feature alignment network for arbitrary style transfer. Proc. AAAI Conf. on Artificial Intelligence, pages 12305–12312, 4 2020.
  35. Few shot generative model adaption via relaxed spatial structural alignment. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, pages 11204–11213, June 2022.
  36. Styleganex: Stylegan-based manipulation beyond cropped aligned faces. In Proc. of Int. Conf. on Computer Vision, 2023.
  37. Pastiche master: Exemplar-based high-resolution portrait style transfer. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, 2022.
  38. Vtoonify: Controllable high-resolution portrait video style transfer. ACM Trans. on Graphics (Proc. of SIGGRAPH Asia), 41(6):1–15, 2022.
  39. The unreasonable effectiveness of deep features as a perceptual metric. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, 2018.
  40. Towards diverse and faithful one-shot adaption of generative adversarial networks. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  41. Few-shot image generation via adaptation-aware kernel modulation. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  42. A closer look at few-shot image generation. Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, pages 9130–9140, 2022.
  43. Exploring incompatible knowledge transfer in few-shot image generation. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, pages 7380–7391, 2023.
  44. General facial representation learning in a visual-linguistic manner. In Proc. of IEEE Conf. on Computer Vision & Pattern Recognition, pages 18676–18688, New Orleans, LA, USA, June 2022. IEEE.
  45. Mind the gap: Domain gap control for single shot domain adaptation for generative adversarial networks. In Proc. of Int. Conf. on Learning Representations, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.