Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diffusing Colors: Image Colorization with Text Guided Diffusion (2312.04145v1)

Published 7 Dec 2023 in cs.CV, cs.GR, and cs.LG

Abstract: The colorization of grayscale images is a complex and subjective task with significant challenges. Despite recent progress in employing large-scale datasets with deep neural networks, difficulties with controllability and visual quality persist. To tackle these issues, we present a novel image colorization framework that utilizes image diffusion techniques with granular text prompts. This integration not only produces colorization outputs that are semantically appropriate but also greatly improves the level of control users have over the colorization process. Our method provides a balance between automation and control, outperforming existing techniques in terms of visual quality and semantic coherence. We leverage a pretrained generative Diffusion Model, and show that we can finetune it for the colorization task without losing its generative power or attention to text prompts. Moreover, we present a novel CLIP-based ranking model that evaluates color vividness, enabling automatic selection of the most suitable level of vividness based on the specific scene semantics. Our approach holds potential particularly for color enhancement and historical image colorization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Jason Antic. 2019. DeOldify: A open-source project for colorizing old images (and video). (2019).
  2. Blended Latent Diffusion. arXiv preprint arXiv:2206.02779 (2022).
  3. SpaText: Spatio-Textual Representation for Controllable Image Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18370–18380.
  4. Blended Diffusion for Text-Driven Editing of Natural Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18208–18218.
  5. Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise. ArXiv abs/2208.09392 (2022).
  6. Language Models are Few-Shot Learners. ArXiv abs/2005.14165 (2020).
  7. COCO-Stuff: Thing and Stuff Classes in Context. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2016), 1209–1218.
  8. L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer. In European Conference on Computer Vision.
  9. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
  10. Prafulla Dhariwal and Alex Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. ArXiv abs/2105.05233 (2021).
  11. StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators. ArXiv abs/2108.00946 (2021).
  12. David Hasler and Sabine E. Suesstrunk. 2003. Measuring colorfulness in natural images. In Human Vision and Electronic Imaging VIII, Bernice E. Rogowitz and Thrasyvoulos N. Pappas (Eds.), Vol. 5007. International Society for Optics and Photonics, SPIE, 87 – 95. https://doi.org/10.1117/12.477378
  13. Prompt-to-Prompt Image Editing with Cross Attention Control. ArXiv abs/2208.01626 (2022).
  14. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6629–6640.
  15. Imagen Video: High Definition Video Generation with Diffusion Models. ArXiv abs/2210.02303 (2022).
  16. Denoising Diffusion Probabilistic Models. ArXiv abs/2006.11239 (2020).
  17. Cascaded Diffusion Models for High Fidelity Image Generation. J. Mach. Learn. Res. 23 (2021), 47:1–47:33.
  18. Jonathan Ho and Tim Salimans. 2022. Classifier-Free Diffusion Guidance. arXiv:2207.12598 [cs.LG]
  19. Video Diffusion Models. ArXiv abs/2204.03458 (2022).
  20. Unicolor: A unified framework for multi-modal colorization with transformer. ACM Transactions on Graphics (TOG) 41, 6 (2022), 1–16.
  21. ColorFormer: Image Colorization via Color Memory Assisted Hybrid-Attention Transformer. In European Conference on Computer Vision.
  22. Imagic: Text-Based Real Image Editing with Diffusion Models. ArXiv abs/2210.09276 (2022).
  23. BigColor: Colorization using a Generative Color Prior for Natural Images. In European Conference on Computer Vision.
  24. DiffWave: A Versatile Diffusion Model for Audio Synthesis. ArXiv abs/2009.09761 (2020).
  25. Colorization Transformer. ArXiv abs/2102.04432 (2021).
  26. Learning Representations for Automatic Colorization. In European Conference on Computer Vision.
  27. Colorization using optimization. ACM SIGGRAPH 2004 Papers (2004).
  28. Improved Diffusion-based Image Colorization via Piggybacked Models. ArXiv abs/2304.11105 (2023). https://api.semanticscholar.org/CorpusID:258291599
  29. Learning to Color from Language. ArXiv abs/1804.06026 (2018).
  30. Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech. In International Conference on Machine Learning.
  31. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  32. Hierarchical Text-Conditional Image Generation with CLIP Latents. ArXiv abs/2204.06125 (2022).
  33. Zero-Shot Text-to-Image Generation. ArXiv abs/2102.12092 (2021).
  34. High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 10674–10685.
  35. Palette: Image-to-Image Diffusion Models. arXiv:2111.05826 [cs.CV]
  36. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. ArXiv abs/2205.11487 (2022).
  37. Instance-Aware Image Colorization. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 7965–7974.
  38. ChromaGAN: Adversarial Picture Colorization with Semantic Class Distribution. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV) (2019), 2434–2443.
  39. Towards Real-World Blind Face Restoration with Generative Facial Prior. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  40. Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model. ArXiv abs/2212.00490 (2022).
  41. L-CoDe: Language-Based Colorization Using Color-Object Decoupled Conditions. In AAAI Conference on Artificial Intelligence.
  42. Towards vivid and diverse image colorization with generative color prior. In Proceedings of the IEEE/CVF international conference on computer vision. 14377–14386.
  43. Disentangled Image Colorization via Global Anchors. ACM Transactions on Graphics (TOG) 41 (2022), 1 – 13.
  44. Lvmin Zhang and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. ArXiv abs/2302.05543 (2023).
  45. Colorful Image Colorization. In European Conference on Computer Vision.
  46. Real-time user-guided image colorization with learned deep priors. ACM Transactions on Graphics (TOG) 36 (2017), 1 – 11.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Nir Zabari (7 papers)
  2. Aharon Azulay (4 papers)
  3. Alexey Gorkor (1 paper)
  4. Tavi Halperin (14 papers)
  5. Ohad Fried (34 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com