ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text (2401.01456v3)
Abstract: Diffusion models have recently demonstrated their effectiveness in generating extremely high-quality images and are now utilized in a wide range of applications, including automatic sketch colorization. Although many methods have been developed for guided sketch colorization, there has been limited exploration of the potential conflicts between image prompts and sketch inputs, which can lead to severe deterioration in the results. Therefore, this paper exhaustively investigates reference-based sketch colorization models that aim to colorize sketch images using reference color images. We specifically investigate two critical aspects of reference-based diffusion models: the "distribution problem", which is a major shortcoming compared to text-based counterparts, and the capability in zero-shot sequential text-based manipulation. We introduce two variations of an image-guided latent diffusion model utilizing different image tokens from the pre-trained CLIP image encoder and propose corresponding manipulation methods to adjust their results sequentially using weighted text inputs. We conduct comprehensive evaluations of our models through qualitative and quantitative experiments as well as a user study.
- Colorization of line drawings with empty pupils. Comput. Graph. Forum, 39(7):601–610, 2020.
- Hyperstyle: Stylegan inversion with hypernetworks for real image editing. In CVPR, pages 18490–18500. IEEE/CVF, 2022.
- Reproducible scaling laws for contrastive language-image learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2818–2829, 2023.
- Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR, pages 8789–8797. IEEE/CVF, 2018.
- Stargan v2: Diverse image synthesis for multiple domains. In CVPR, pages 8185–8194. IEEE/CVF, 2020.
- Danbooru2021: A large-scale crowdsourced and tagged anime illustration dataset. https://gwern.net/danbooru2021, 2022. Accessed: DATE 2022-01-21.
- Diffusion models beat gans on image synthesis. In NeurIPS, pages 8780–8794, 2021.
- Taming transformers for high-resolution image synthesis. In CVPR, pages 12873–12883. IEEE/CVF, 2021.
- A fast and efficient semi-guided algorithm for flat coloring line-arts. In Vision, Modeling and Visualization VMV, pages 1–9. Eurographics Association, 2018.
- Comicolorization: semi-automatic manga colorization. In SIGGRAPH Asia, pages 12:1–12:4. ACM, 2017.
- Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Trans. Graph., 41(4):141:1–141:13, 2022.
- Image style transfer using convolutional neural networks. In CVPR, pages 2414–2423. IEEE/CVF, 2016.
- Generative adversarial nets. In NeurIPS, pages 2672–2680, 2014.
- Reimu Hakurei. Hugging face/waifu-diffusion-v1-4. https://huggingface.co/hakurei/waifu-diffusion-v1-4, 2023. Accessed: DATE 2023-03-05.
- Prompt-to-prompt image editing with cross-attention control. In ICLR. OpenReview.net, 2023.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, pages 6626–6637, 2017.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Classifier-free diffusion guidance. CoRR, abs/2207.12598, 2022.
- Lora: Low-rank adaptation of large language models. In ICLR. OpenReview.net, 2022.
- Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, pages 1510–1519. IEEE/CVF, 2017.
- Openclip, 2021.
- Image-to-image translation with conditional adversarial networks. In CVPR, pages 5967–5976. IEEE/CVF, 2017.
- Perceptual losses for real-time style transfer and super-resolution. In ECCV, volume 9906, pages 694–711. Springer, 2016.
- A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401–4410. IEEE/CVF, 2019.
- Analyzing and improving the image quality of stylegan. In CVPR, pages 8107–8116. IEEE/CVF, 2020.
- Diffusionclip: Text-guided diffusion models for robust image manipulation. In CVPR, pages 2416–2425. IEEE/CVF, 2022.
- Tag2pix: Line art colorization using text tag with secat and changing loss. In ICCV, pages 9055–9064. IEEE/CVF, 2019.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Auto-encoding variational bayes. In ICLR, 2014.
- Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In CVPR, pages 5800–5809. IEEE/CVF, 2020.
- More control for free! image synthesis with semantic diffusion guidance. In WACV, pages 289–299. IEEE/CVF, 2023.
- lllyasviel. Sketchkeras. https://github.com/lllyasviel/sketchKeras, 2017.
- Decoupled weight decay regularization. In ICLR. OpenReview.net, 2019.
- Dpm-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In NeurIPS, 2022.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. CoRR, abs/2211.01095, 2022.
- Lyumin Zhang Mikubill. sd-webui-controlnet. https://github.com/Mikubill/sd-webui-controlnet, 2023. Accessed: DATE 2023-07-01.
- Delaunay painting: Perceptual image colouring from raster contours with gaps. Computer Graphics Forum, 41(6):166–181, 2022.
- Styleclip: Text-driven manipulation of stylegan imagery. In ICCV, pages 2065–2074. IEEE/CVF, 2021.
- Learning transferable visual models from natural language supervision. In ICML, volume 139, pages 8748–8763. PMLR, 2021.
- Hierarchical text-conditional image generation with CLIP latents. CoRR, abs/2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10674–10685. IEEE/CVF, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In MICCAI, volume 9351, pages 234–241. Springer, 2015.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pages 22500–22510. IEEE/CVF, 2023.
- Hyperdreambooth: Hypernetworks for fast personalization of text-to-image models. CoRR, abs/2307.06949, 2023.
- Image deformation using moving least squares. ACM Trans. Graph., 25(3):533–540, 2006.
- LAION-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
- Maximilian Seitzer. pytorch-fid: FID Score for PyTorch. https://github.com/mseitzer/pytorch-fid, 2023. Accessed: DATE 2023-05-17.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, volume 37, pages 2256–2265. JMLR.org, 2015.
- Denoising diffusion implicit models. In ICLR. OpenReview.net, 2021.
- Score-based generative modeling through stochastic differential equations. In ICLR. OpenReview.net, 2021.
- Adversarial colorization of icons based on contour and color conditions. In ACM MM, pages 683–691. ACM, 2019.
- Lazybrush: Flexible painting tool for hand-drawn cartoons. Comput. Graph. Forum, 28(2):599–608, 2009.
- Plug-and-play diffusion features for text-driven image-to-image translation. In CVPR, pages 1921–1930. IEEE/CVF, 2023.
- Neural discrete representation learning. In NeurIPS, pages 6306–6315, 2017.
- Adversarial open domain adaptation for sketch-to-photo synthesis. In WACV, pages 944–954. IEEE/CVF, 2022.
- Two-step training: Adjustable sketch colourization via reference image and text tag. Computer Graphics Forum, 2023.
- Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. CoRR, abs/2308.06721, 2023.
- Yuno779. https://civitai.com/models/9409, 2023. Accessed: DATE 2023-06-25.
- Lvmin Zhang. Style2paints v5, 2023. Accessed: DATE 2023-06-25.
- Adding conditional control to text-to-image diffusion models. CoRR, abs/2302.05543, 2023.
- Two-stage sketch colorization. ACM Trans. Graph., 37(6):261, 2018.
- Adding conditional control to text-to-image diffusion models. In ICCV, pages 3836–3847, 2023.
- Colorful image colorization. In ECCV, volume 9907, pages 649–666. Springer, 2016.
- Real-time user-guided image colorization with learned deep priors. ACM Trans. Graph., 36(4):119:1–119:11, 2017.
- Real-world image variation by aligning diffusion inversion chain. CoRR, abs/2305.18729, 2023.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, pages 2242–2251. IEEE/CVF, 2017.
- Language-based colorization of scene sketches. ACM Trans. Graph., 38(6), 2019.