LanDA: Language-Guided Multi-Source Domain Adaptation (2401.14148v1)
Abstract: Multi-Source Domain Adaptation (MSDA) aims to mitigate changes in data distribution when transferring knowledge from multiple labeled source domains to an unlabeled target domain. However, existing MSDA techniques assume target domain images are available, yet overlook image-rich semantic information. Consequently, an open question is whether MSDA can be guided solely by textual cues in the absence of target domain images. By employing a multimodal model with a joint image and language embedding space, we propose a novel language-guided MSDA approach, termed LanDA, based on optimal transfer theory, which facilitates the transfer of multiple source domains to a new target domain, requiring only a textual description of the target domain without needing even a single target domain image, while retaining task-relevant information. We present extensive experiments across different transfer scenarios using a suite of relevant benchmarks, demonstrating that LanDA outperforms standard fine-tuning and ensemble approaches in both target and source domains.
- Unsupervised multi-source domain adaptation without access to source data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10103–10112, 2021.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Quantitative concentration inequalities for empirical measures on non-compact spaces. Probability Theory and Related Fields, 137:541–593, 2007.
- Stylip: Multi-scale style-conditioned prompt learning for clip-based domain generalization. arXiv preprint arXiv:2302.09251, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Transferability vs. discriminability: Batch spectral penalization for adversarial domain adaptation. In International conference on machine learning, pages 1081–1090. PMLR, 2019a.
- Uniter: Learning universal image-text representations. 2019b.
- Promptstyler: Prompt-driven style generation for source-free domain generalization. arXiv preprint arXiv:2307.15199, 2023.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Joint distribution optimal transportation for domain adaptation. Advances in neural information processing systems, 30, 2017.
- Vqgan-clip: Open domain image generation and editing with natural language guidance. In European Conference on Computer Vision, pages 88–105. Springer, 2022.
- Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Using language to extend to unseen domains. In The Eleventh International Conference on Learning Representations, 2022.
- Optimal transport for domain adaptation. IEEE Trans. Pattern Anal. Mach. Intell, 1:1–40, 2016.
- Stylegan-nada: Clip-guided domain adaptation of image generators. arXiv preprint arXiv:2108.00946, 2021.
- A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence, 45(1):87–110, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Algorithms and theory for multiple-source adaptation. Advances in neural information processing systems, 31, 2018.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021.
- L Kantorovich. On the transfer of masses (in russian). In Doklady Akademii Nauk, page 227, 1942.
- Diffusionclip: Text-guided image manipulation using diffusion models. 2021.
- Big transfer (bit): General visual representation learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pages 491–507. Springer, 2020.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv preprint arXiv:2202.10054, 2022.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning, pages 12888–12900. PMLR, 2022.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
- Conditional adversarial domain adaptation. Advances in neural information processing systems, 31, 2018.
- Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems, 32, 2019.
- Boosting domain adaptation by discovering latent domains. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3771–3780, 2018.
- Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430, 2009.
- Eduardo Fernandes Montesuma and Fred Maurice Ngole Mboula. Wasserstein barycenter for multi-source domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16785–16793, 2021.
- Recent advances in optimal transport for machine learning. arXiv preprint arXiv:2306.16156, 2023.
- Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2085–2094, 2021.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1406–1415, 2019.
- Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5-6):355–607, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Clipood: Generalizing clip to out-of-distributions. arXiv preprint arXiv:2302.00864, 2023.
- Ad-clip: Adapting domains in prompt space using clip. arXiv preprint arXiv:2308.05659, 2023.
- Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530, 2019.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Your classifier can secretly suffice multi-source domain adaptation. Advances in Neural Information Processing Systems, 33:4647–4659, 2020.
- Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5018–5027, 2017.
- Metateacher: Coordinating multi-model domain adaptation for medical image classification. Advances in Neural Information Processing Systems, 35:20823–20837, 2022.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
- Deep cocktail network: Multi-source unsupervised domain adaptation with category shift. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3964–3973, 2018.
- Open-set domain adaptation with visual-language foundation models. arXiv preprint arXiv:2307.16204, 2023.
- Domain prompt learning for efficiently adapting clip to unseen domains. arXiv preprint arXiv:2111.12853, 2021.
- Multi-source distilling domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 12975–12983, 2020.
- Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9719–9728, 2020.
- Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022a.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022b.
- Zhenbin Wang (7 papers)
- Lei Zhang (1689 papers)
- Lituan Wang (5 papers)
- Minjuan Zhu (2 papers)