Prompting Diffusion Representations for Cross-Domain Semantic Segmentation (2307.02138v1)
Abstract: While originally designed for image generation, diffusion models have recently shown to provide excellent pretrained feature representations for semantic segmentation. Intrigued by this result, we set out to explore how well diffusion-pretrained representations generalize to new domains, a crucial ability for any representation. We find that diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation, outperforming both supervised and self-supervised backbone networks. Motivated by this, we investigate how to utilize the model's unique ability of taking an input prompt, in order to further enhance its cross-domain performance. We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head. Moreover, we propose a simple but highly effective approach for test-time domain adaptation, based on learning a scene prompt on the target domain in an unsupervised manner. Extensive experiments conducted on four synthetic-to-real and clear-to-adverse weather benchmarks demonstrate the effectiveness of our approaches. Without resorting to any complex techniques, such as image translation, augmentation, or rare-class sampling, we set a new state-of-the-art on all benchmarks. Our implementation will be publicly available at \url{https://github.com/ETHRuiGong/PTDiffSeg}.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI, 2017.
- Robustnet: Improving domain generalization in urban-scene segmentation via instance selective whitening. In CVPR, 2021.
- The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
- Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999.
- Transadapt: A transformative framework for online test time adaptive semantic segmentation. In ICASSP, 2023.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Normalization perturbation: A simple domain generalization method for real-world domain shifts. arXiv preprint arXiv:2211.04393, 2022.
- Francois Fleuret et al. Uncertainty reduction for model adaptation in semantic segmentation. In CVPR, 2021.
- Masked autoencoders are scalable vision learners. In CVPR, 2022.
- Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Cycada: Cycle-consistent adversarial domain adaptation. In ICML, 2018.
- Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In CVPR, 2022.
- Fsdr: Frequency space domain randomization for domain generalization. In CVPR, 2021.
- Model adaptation: Historical contrastive learning for unsupervised domain adaptation without source data. In NeurIPS, 2021.
- C-sfda: A curriculum learning aided self-training framework for efficient source free domain adaptation. In CVPR, 2023.
- Wedge: web-image assisted domain generalization for semantic segmentation. arXiv preprint arXiv:2109.14196, 2021.
- Panoptic feature pyramid networks. In CVPR, 2019.
- Uncertainty modeling for out-of-distribution generalization. arXiv preprint arXiv:2202.03958, 2022.
- Universal style transfer via feature transforms. In NeurIPS, 2017.
- Intra-source style augmentation for improved domain generalization. In WACV, 2023.
- Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In ICML, 2020.
- Source-free domain adaptation for semantic segmentation. In CVPR, 2021.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11976–11986, 2022.
- Fully convolutional networks for semantic segmentation. In CVPR, 2015.
- The norm must go on: dynamic unsupervised domain adaptation by normalization. In CVPR, 2022.
- Evaluating prediction-time batch normalization for robustness under covariate shift. arXiv preprint arXiv:2006.10963, 2020.
- Two at once: Enhancing learning and generalization capacities via ibn-net. In ECCV, 2018.
- Semantic-aware domain generalized segmentation. In CVPR, 2022.
- Global and local texture randomization for synthetic-to-real semantic segmentation. TIP, 2021.
- Augco: augmentation consistency-guided self-training for source-free domain adaptive semantic segmentation. arXiv preprint arXiv:2107.10140, 2021.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- Master of all: Simultaneous generalization of urban-scene segmentation to all adverse weather conditions. In ECCV, 2022.
- Playing for data: Ground truth from computer games. In ECCV, 2016.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR, 2016.
- Guided curriculum model adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. In ICCV, 2019.
- Map-guided curriculum domain adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation. TPAMI, 2020.
- Acdc: The adverse conditions dataset with correspondences for semantic driving scene understanding. In ICCV, 2021.
- Laion-5b: An open large-scale dataset for training next generation image-text models. arXiv preprint arXiv:2210.08402, 2022.
- Augmentation-based domain generalization for semantic segmentation. arXiv preprint arXiv:2304.12122, 2023.
- Domain randomization for transferring deep neural networks from simulation to the real world. In IROS, 2017.
- Dacs: Domain adaptation via cross-domain mixed sampling. In WACV, 2021.
- Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In CVPR Workshops, 2018.
- Learning to adapt structured output space for semantic segmentation. In CVPR, 2018.
- Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. In CVPR, 2017.
- Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. In CVPR, 2019.
- Tent: Fully test-time adaptation by entropy minimization. arXiv preprint arXiv:2006.10726, 2020.
- Deep high-resolution representation learning for visual recognition. TPAMI, 2020.
- Continual test-time domain adaptation. In CVPR, 2022.
- Dannet: A one-stage domain adaptation network for unsupervised nighttime semantic segmentation. In CVPR, 2021.
- Siamdoge: Domain generalizable semantic segmentation using siamese network. In ECCV, 2022.
- Segformer: Simple and efficient design for semantic segmentation with transformers. In NeurIPS, 2021.
- Open-vocabulary panoptic segmentation with text-to-image diffusion models. arXiv preprint arXiv:2303.04803, 2023.
- Fda: Fourier domain adaptation for semantic segmentation. In CVPR, 2020.
- Source data-free unsupervised domain adaptation for semantic segmentation. In ACM MM, 2021.
- Domain randomization and pyramid consistency: Simulation-to-real generalization without accessing target domain data. In ICCV, 2019.
- Unleashing text-to-image diffusion models for visual perception. arXiv preprint arXiv:2303.02153, 2023.
- Style-hallucinated dual consistency learning for domain generalized semantic segmentation. In ECCV, 2022.
- Scene parsing through ade20k dataset. In CVPR, 2017.
- Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008, 2021.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, 2017.
- Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In ECCV, 2018.
- Rui Gong (17 papers)
- Martin Danelljan (96 papers)
- Han Sun (31 papers)
- Julio Delgado Mangas (1 paper)
- Luc Van Gool (570 papers)