IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis (2403.13378v2)
Abstract: Semantic image synthesis aims to generate high-quality images given semantic conditions, i.e. segmentation masks and style reference images. Existing methods widely adopt generative adversarial networks (GANs). GANs take all conditional inputs and directly synthesize images in a single forward step. In this paper, semantic image synthesis is treated as an image denoising task and is handled with a novel image-to-image diffusion model (IIDM). Specifically, the style reference is first contaminated with random noise and then progressively denoised by IIDM, guided by segmentation masks. Moreover, three techniques, refinement, color-transfer and model ensembles, are proposed to further boost the generation quality. They are plug-in inference modules and do not require additional training. Extensive experiments show that our IIDM outperforms existing state-of-the-art methods by clear margins. Further analysis is provided via detailed demonstrations. We have implemented IIDM based on the Jittor framework; code is available at https://github.com/ader47/jittor-jieke-semantic_images_synthesis.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, volume 30, pages 6626–6637, 2017.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Jittor: a novel deep learning framework with meta-operators and unified graph execution. Science China Information Sciences, 63:1–21, 2020.
- Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2337–2346, 2019.
- Color transfer between images. IEEE Computer Graphics and Applications, 21(5):34–41, 2001.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Nima: Neural image assessment. IEEE Transactions on Image Processing, 27(8):3998–4011, 2018.
- Efficient semantic image synthesis via class-adaptive normalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4852–4866, 2021.
- Image synthesis via semantic composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13749–13758, 2021.
- Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021.
- Jittor-gan: A fast-training generative adversarial network model zoo based on jittor. Computational Visual Media, 7:153–157, 2021.
- Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5104–5113, 2020.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.