Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration (2403.10098v1)

Published 15 Mar 2024 in cs.CV

Abstract: Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of degradation patterns. Current methods have low generalization across photorealistic and heterogeneous domains. In this paper, we propose a Diffusion-Information-Diffusion (DID) framework to tackle diffusion manifold hallucination correction (DiffMAC), which achieves high-generalization face restoration in diverse degraded scenes and heterogeneous domains. Specifically, the first diffusion stage aligns the restored face with spatial feature embedding of the low-quality face based on AdaIN, which synthesizes degradation-removal results but with uncontrollable artifacts for some hard cases. Based on Stage I, Stage II considers information compression using manifold information bottleneck (MIB) and finetunes the first diffusion model to improve facial fidelity. DiffMAC effectively fights against blind degradation patterns and synthesizes high-quality faces with attribute and identity consistencies. Experimental results demonstrate the superiority of DiffMAC over state-of-the-art methods, with a high degree of generalization in real-world and heterogeneous settings. The source code and models will be public.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in ICCV, 2017.
  2. P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high-resolution image synthesis,” in CVPR, 2021, pp. 12 873–12 883.
  3. A. Van Den Oord, O. Vinyals et al., “Neural discrete representation learning,” NIPS, vol. 30, 2017.
  4. N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in 2015 ieee information theory workshop (itw).   IEEE, 2015, pp. 1–5.
  5. A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” ICLR, 2017.
  6. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” ICLR, 2014.
  7. K. Schulz, L. Sixt, F. Tombari, and T. Landgraf, “Restricting the flow: Information bottlenecks for attribution,” arXiv preprint arXiv:2001.00396, 2020.
  8. B. Yang, S. Gu, B. Zhang, T. Zhang, X. Chen, X. Sun, D. Chen, and F. Wen, “Paint by example: Exemplar-based image editing with diffusion models,” in CVPR, 2023, pp. 18 381–18 391.
  9. M. Xiao, S. Zheng, C. Liu, Y. Wang, D. He, G. Ke, J. Bian, Z. Lin, and T.-Y. Liu, “Invertible image rescaling,” in ECCV.   Springer, 2020, pp. 126–144.
  10. P. Liu, H. Zhang, K. Zhang, L. Lin, and W. Zuo, “Multi-level wavelet-cnn for image restoration,” in CVPR workshops, 2018, pp. 773–782.
  11. H. Huang, R. He, Z. Sun, and T. Tan, “Wavelet-srnet: A wavelet-based cnn for multi-scale face super resolution,” in ICCV, 2017, pp. 1689–1697.
  12. X. Li, M. Liu, Y. Ye, W. Zuo, L. Lin, and R. Yang, “Learning warped guidance for blind face restoration,” in ECCV, 2018, pp. 272–289.
  13. X. Li, W. Li, D. Ren, H. Zhang, M. Wang, and W. Zuo, “Enhanced blind face restoration with multi-exemplar images and adaptive spatial feature fusion,” in CVPR, 2020, pp. 2706–2715.
  14. X. Yan, S. Cui, “Towards content-independent multi-reference super-resolution: Adaptive pattern matching and feature aggregation,” in ECCV, 2020, pp. 52-68.
  15. Y. Zhang, I. W. Tsang, Y. Luo, C.-H. Hu, X. Lu, and X. Yu, “Copy and paste gan: Face hallucination from shaded thumbnails,” in CVPR, 2020, pp. 7355–7364.
  16. X. T. J. L. J. J. Liying Lu1, Wenbo Li1, “Masa-sr: Matching acceleration and spatial adaptation for reference-based image super-resolution,” in CVPR, 2021.
  17. X. Li, C. Chen, S. Zhou, and et al., “Blind face restoration via deep multi-scale component dictionaries,” in ECCV.   Springer, 2020, pp. 399–415.
  18. S. Menon, A. Damian, S. Hu, and et al., “Pulse: Self-supervised photo upsampling via latent space exploration of generative models,” in CVPR, 2020, pp. 2437–2445.
  19. K. C. Chan, X. Wang, X. Xu, and et al., “Glean: Generative latent bank for large-factor image super-resolution,” arXiv preprint arXiv:2012.00739, 2020.
  20. X. Wang, Y. Li, H. Zhang, and Y. Shan, “Towards real-world blind face restoration with generative facial prior,” in CVPR, 2021.
  21. T. Yang, P. Ren, X. Xie, and L. Zhang, “Gan prior embedded network for blind face restoration in the wild,” in CVPR, 2021, pp. 672–681.
  22. B. Dogan, S. Gu, and R. Timofte, “Exemplar guided face image super-resolution without facial landmarks,” in CVPR Workshops, 2019, pp. 0–0.
  23. T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in CVPR, 2019, pp. 4401–4410.
  24. T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in CVPR, 2020, pp. 8110–8119.
  25. T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in CVPR, 2019, pp. 2337–2346.
  26. L. Li, J. Bao, H. Yang, D. Chen, and F. Wen, “Advancing high fidelity identity swapping for forgery detection,” in CVPR, June 2020.
  27. C. Chen, X. Li, L. Yang, and et al., “Progressive semantic-aware style transformation for blind face restoration,” arXiv preprint arXiv:2009.08709, 2020.
  28. A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2d &\&& 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks),” in ICCV, 2017.
  29. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  30. A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in CVPR.   IEEE, 2009, pp. 1964–1971.
  31. T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
  32. G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database forstudying face recognition in unconstrained environments,” in Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008.
  33. J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in CVPR, 2019, pp. 4690–4699.
  34. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” in NIPS, 2017.
  35. J. Ke, Q. Wang, Y. Wang, P. Milanfar, and F. Yang, “Musiq: Multi-scale image quality transformer,” in ICCV, 2021, pp. 5148–5157.
  36. A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely blind” image quality analyzer,” IEEE Signal processing letters, vol. 20, no. 3, pp. 209–212, 2012.
  37. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in CVPR, 2018, pp. 586–595.
  38. J. Kim, M. Kim, H. Kang, and K. Lee, “U-gat-it: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation,” in ICLR, 2020.
  39. S. Zhou, K. C. Chan, C. Li, and C. C. Loy, “Towards robust blind face restoration with codebook lookup transformer,” in NeurIPS, 2022.
  40. Y. Gu, X. Wang, L. Xie, C. Dong, G. Li, Y. Shan, and M.-M. Cheng, “Vqfr: Blind face restoration with vector-quantized dictionary and parallel decoder,” in ECCV, 2022.
  41. G. Gao, H. Huang, C. Fu, Z. Li, and R. He, “Information bottleneck disentanglement for identity swapping,” in CVPR, 2021, pp. 3404–3413.
  42. J. Li, Z. Li, J. Cao, X. Song, and R. He, “Faceinpainter: High fidelity face adaptation to heterogeneous domains,” in CVPR, 2021, pp. 5089–5098.
  43. L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in ICCV, 2023, pp. 3836–3847.
  44. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in CVPR, 2022, pp. 10 684–10 695.
  45. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning.   PMLR, 2021, pp. 8748–8763.
  46. Z. Yue and C. C. Loy, “Difface: Blind face restoration with diffused error contraction,” arXiv preprint arXiv:2212.06512, 2022.
  47. C. Chen, X. Shi, Y. Qin, X. Li, X. Han, T. Yang, and S. Guo, “Real-world blind super-resolution via feature matching with implicit high-resolution priors,” in ACM MM, 2022, pp. 1329–1338.
  48. Z. C. Z. L. B. F. B. D. W. O. Y. Q. C. D. Xinqi Lin, Jingwen He, “Diffbir: Towards blind image restoration with generative diffusion prior,” arxiv, 2023.
  49. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in ICLR, 2019.
  50. A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in ICML.   PMLR, 2021, pp. 8162–8171.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Nan Gao (53 papers)
  2. Jia Li (380 papers)
  3. Huaibo Huang (58 papers)
  4. Zhi Zeng (105 papers)
  5. Ke Shang (18 papers)
  6. Ran He (172 papers)
  7. ShuWu Zhang (7 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.