Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 131 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 19 tok/s Pro
GPT-5 High 21 tok/s Pro
GPT-4o 79 tok/s Pro
Kimi K2 185 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation (2403.02899v1)

Published 5 Mar 2024 in cs.AI

Abstract: Conventional Unsupervised Domain Adaptation (UDA) strives to minimize distribution discrepancy between domains, which neglects to harness rich semantics from data and struggles to handle complex domain shifts. A promising technique is to leverage the knowledge of large-scale pre-trained vision-LLMs for more guided adaptation. Despite some endeavors, current methods often learn textual prompts to embed domain semantics for source and target domains separately and perform classification within each domain, limiting cross-domain knowledge transfer. Moreover, prompting only the language branch lacks flexibility to adapt both modalities dynamically. To bridge this gap, we propose Domain-Agnostic Mutual Prompting (DAMP) to exploit domain-invariant semantics by mutually aligning visual and textual embeddings. Specifically, the image contextual information is utilized to prompt the language branch in a domain-agnostic and instance-conditioned way. Meanwhile, visual prompts are imposed based on the domain-agnostic textual prompt to elicit domain-invariant visual embeddings. These two branches of prompts are learned mutually with a cross-attention module and regularized with a semantic-consistency loss and an instance-discrimination contrastive loss. Experiments on three UDA benchmarks demonstrate the superiority of DAMP over state-of-the-art approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Exploring visual prompts for adapting large-scale models. arXiv preprint arXiv:2203.17274, 2022.
  2. Recognition in terra incognita. In ECCV, pages 456–473, 2018.
  3. Multi-prompt alignment for multi-source unsupervised domain adaptation. arXiv preprint arXiv:2209.15210, 2022.
  4. Transferability vs. discriminability: Batch spectral penalization for adversarial domain adaptation. In ICML, pages 1081–1090. PMLR, 2019.
  5. Randaugment: Practical automated data augmentation with a reduced search space. In CVPR workshops, pages 702–703, 2020.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  7. Cross-domain gradient discrepancy minimization for unsupervised domain adaptation. In CVPR, pages 3937–3946, 2021.
  8. Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In ICCV, pages 1657–1664, 2013.
  9. Partial feature selection and alignment for multi-source domain adaptation. In CVPR, pages 16654–16663, 2021.
  10. Decorate the newcomers: Visual domain prompt for continual test time adaptation. In AAAI, pages 7595–7603, 2023.
  11. Unsupervised domain adaptation by backpropagation. In ICML, pages 1180–1189. PMLR, 2015.
  12. Domain adaptation via prompt learning. TNNLS, pages 1–11, 2023.
  13. In search of lost domain generalization. In ICLR, 2020.
  14. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  15. Clip-s4: Language-guided self-supervised semantic segmentation. In CVPR, pages 11207–11216, 2023.
  16. Unit: Multimodal multitask learning with a unified transformer. In ICCV, pages 1439–1449, 2021.
  17. Learning discrete representations via information maximizing self-augmented training. In ICML, pages 1558–1567. PMLR, 2017.
  18. Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, pages 4904–4916. PMLR, 2021.
  19. Visual prompt tuning. In ECCV, pages 709–727. Springer, 2022.
  20. Understanding and constructing latent modality structures in multi-modal representation learning. In CVPR, pages 7661–7671, 2023.
  21. Contrastive adaptation network for unsupervised domain adaptation. In CVPR, pages 4893–4902, 2019.
  22. Maple: Multi-modal prompt learning. In CVPR, pages 19113–19122, 2023.
  23. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  24. Padclip: Pseudo-labeling with adaptive debiasing in clip for unsupervised domain adaptation. In ICCV, pages 16155–16165, 2023.
  25. Deeper, broader and artier domain generalization. In ICCV, pages 5542–5550, 2017.
  26. Maximum density divergence for domain adaptation. TPAMI, 43(11):3918–3930, 2020.
  27. T-svdnet: Exploring high-order prototypical correlations for multi-source domain adaptation. In ICCV, pages 9991–10000, 2021.
  28. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In ICML, pages 6028–6039. PMLR, 2020.
  29. Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning. In NeurIPS, pages 17612–17625, 2022.
  30. Cycle self-training for domain adaptation. In NeurIPS, pages 22968–22981, 2021.
  31. Learning transferable features with deep adaptation networks. In ICML, pages 97–105. PMLR, 2015.
  32. Deep transfer learning with joint adaptation networks. In ICML, pages 2208–2217. PMLR, 2017.
  33. Conditional adversarial domain adaptation. In NeurIPS, 2018.
  34. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  35. Instance adaptive self-training for unsupervised domain adaptation. In ECCV, pages 415–430. Springer, 2020.
  36. Deep causal representation learning for unsupervised domain adaptation. arXiv preprint arXiv:1910.12417, 2019.
  37. A survey on transfer learning. TKDE, 22(10):1345–1359, 2009.
  38. Visda: The visual domain adaptation challenge. arXiv preprint arXiv:1710.06924, 2017.
  39. Moment matching for multi-source domain adaptation. In ICCV, pages 1406–1415, 2019.
  40. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
  41. Denseclip: Language-guided dense prediction with context-aware prompting. In CVPR, pages 18082–18091, 2022.
  42. Multi-source unsupervised domain adaptation via pseudo target domain. TIP, 31:2122–2135, 2022.
  43. Maximum classifier discrepancy for unsupervised domain adaptation. In CVPR, pages 3723–3732, 2018.
  44. Generate to adapt: Aligning domains using generative adversarial networks. In CVPR, pages 8503–8512, 2018.
  45. Ad-clip: Adapting domains in prompt space using clip. In ICCV, pages 4355–4364, 2023.
  46. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In NeurIPS, pages 596–608, 2020.
  47. Deep coral: Correlation alignment for deep domain adaptation. In ECCV, pages 443–450. Springer, 2016.
  48. Safe self-refinement for transformer-based domain adaptation. In CVPR, pages 7191–7200, 2022.
  49. Unsupervised domain adaptation via structurally regularized deep clustering. In CVPR, pages 8725–8735, 2020.
  50. Training data-efficient image transformers & distillation through attention. In ICML, pages 10347–10357. PMLR, 2021.
  51. Multimodal transformer for unaligned multimodal language sequences. In ACL, page 6558, 2019.
  52. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474, 2014.
  53. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. JMLR, 9(11), 2008.
  54. Attention is all you need. In NeurIPS, 2017.
  55. Your classifier can secretly suffice multi-source domain adaptation. pages 4647–4659, 2020.
  56. Deep hashing network for unsupervised domain adaptation. In CVPR, pages 5018–5027, 2017.
  57. Learning to combine: Knowledge aggregation for multi-source domain adaptation. In ECCV, pages 727–744. Springer, 2020.
  58. Debiased learning from naturally imbalanced pseudo-labels. In CVPR, pages 14647–14657, 2022.
  59. Toalign: task-oriented alignment for unsupervised domain adaptation. In NeurIPS, pages 13834–13846, 2021.
  60. Cdtrans: Cross-domain transformer for unsupervised domain adaptation. arXiv preprint arXiv:2109.06165, 2021.
  61. Tvt: Transferable vision transformer for unsupervised domain adaptation. In WACV, pages 520–530, 2023.
  62. Cpt: Colorful prompt tuning for pre-trained vision-language models. arXiv preprint arXiv:2109.11797, 2021.
  63. Turning a clip model into a scene text detector. In CVPR, pages 6978–6988, 2023.
  64. Make the u in uda matter: Invariant consistency learning for unsupervised domain adaptation. arXiv preprint arXiv:2309.12742, 2023.
  65. Instance-aware dynamic prompt tuning for pre-trained point cloud models. arXiv preprint arXiv:2304.07221, 2023.
  66. Domain prompt learning for efficiently adapting clip to unseen domains. TJSAI, 38(6):B–MC2_1, 2023a.
  67. Domain-symmetric networks for adversarial domain adaptation. In CVPR, pages 5031–5040, 2019.
  68. Towards effective instance discrimination contrastive loss for unsupervised domain adaptation. In ICCV, pages 11388–11399, 2023b.
  69. Multi-source distilling domain adaptation. In AAAI, pages 12975–12983, 2020.
  70. Domain adaptive ensemble learning. TIP, 30:8008–8018, 2021.
  71. Conditional prompt learning for vision-language models. In CVPR, pages 16816–16825, 2022a.
  72. Learning to prompt for vision-language models. IJCV, 130(9):2337–2348, 2022b.
  73. Aligning domain-specific distribution and classifier for cross-domain classification from multiple sources. In AAAI, pages 5989–5996, 2019.
Citations (8)

Summary

  • The paper introduces DAMP, a novel approach that aligns visual and textual embeddings for improved unsupervised domain adaptation.
  • The mutual prompting framework employs cross-attention and bespoke loss functions to enhance domain-invariant representation.
  • Experimental results on benchmarks like Office-Home and VisDA-17 demonstrate DAMP's superior accuracy over state-of-the-art methods.

Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation

Overview

The paper "Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation" introduces a novel approach named Domain-Agnostic Mutual Prompting (DAMP) to address Unsupervised Domain Adaptation (UDA) tasks by leveraging Vision-LLMs (VLMs). Traditional UDA methods often focus on minimizing distribution discrepancies across domains, which can overlook semantic richness within data and struggle with complex domain shifts. DAMP utilizes pre-trained vision-LLMs to enhance guided adaptation and enables dynamic cross-domain knowledge transfer through mutual alignment of visual and textual embeddings.

Methodology

Mutual Prompting Framework

DAMP proposes a domain-agnostic mechanism where both visual and textual embeddings are aligned. Unlike existing methods that separately learn textual prompts for source and target domains, DAMP integrates image contextual information to prompt the language branch in a domain-agnostic manner, while incorporating visual prompts derived from textual prompts to elicit domain-invariant visual representations. This mutual prompt learning employs a cross-attention module akin to a Transformer decoder, and includes a semantic-consistency loss and contrastive loss for instance discrimination to regularize the learning process.

Two-Branch Prompt Learning

The mutual prompting framework operates over two branches:

  • Language-Guided Visual Prompting: The textual prompts guide the vision backbone to generate domain-invariant visual embeddings.
  • Vision-Guided Language Prompting: Visual information prompts textual embeddings to align semantically with instance-specific contexts.

Loss Functions

To ensure domain-invariant properties of the learned prompts, DAMP introduces auxiliary loss functions:

  • Semantic-Consistency Regularization: Ensures that strongly augmented samples are accurately classified.
  • Instance-Discrimination Contrastive Loss: Enhances domain-agnostic information in textual prompts by maximizing differences among images from the same domain.

Experimentation

DAMP demonstrates superior performance across three UDA benchmarks (Office-Home, VisDA-17, and Mini-DomainNet). The experiments validate the framework's ability to leverage both pre-trained VLM knowledge and source domain knowledge effectively. Notably, DAMP achieves high accuracy rates on challenging tasks, outperforming state-of-the-art UDA approaches.

Implications and Future Directions

The introduction of DAMP presents a compelling method for UDA by integrating VLMs with mutual prompt strategies. Its ability to dynamically adapt both vision and language modalities offers substantial improvements in domain adaptation tasks. Future work may explore extending this approach to broader contexts such as multi-source domain adaptation or domain generalization. Moreover, further refinement of the prompting architecture and loss functions could optimize semantic alignment and improve robust transferability across diverse domains.

Conclusion

The paper proposes an innovative approach to UDA through mutual prompting of vision-LLMs. DAMP sets a new standard in domain adaptation by effectively aligning multimodal embeddings to harness domain-invariant semantics. The promising results and inherent flexibility offer significant advancements in adapting AI models across visually diverse and semantically complex domains.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.