- The paper presents unsupervised domain adaptation methods focused on aligning source and target data distributions.
- It categorizes approaches into shallow and deep strategies, utilizing distance metrics and adversarial techniques to mitigate domain shifts.
- The review identifies limitations such as its narrow focus on unsupervised adaptation, suggesting the need for further detailed studies.
Domain Adaptation: A Review
This paper provides a concise overview of domain adaptation (DA), a subfield of machine learning focused on addressing the challenges posed when training and test data originate from different distributions. The paper emphasizes unsupervised domain adaptation (UDA), where labels are available only in the source domain.
Background and Motivation
Traditional machine learning assumes that training and test data are drawn from identical distributions. This assumption is often violated in real-world scenarios due to factors such as evolving data characteristics or disparate data sources. DA techniques aim to mitigate the performance degradation that occurs when models trained on one distribution (the source domain) are applied to a different distribution (the target domain). DA is presented as a specific instance of transfer learning, where the primary distinction lies in the scope of change between source and target: transfer learning addresses changes in both tasks and domains, while DA focuses solely on domain shifts.
Domain Adaptation Taxonomy
The paper categorizes DA approaches based on the relationship between source and target label spaces, identifying four main categories:
- Closed Set DA: Source and target domains share the same classes, but their probability distributions differ.
- Open Set DA: The domains share some labels, but each also has private labels. In modified open set DA, the source label set is a subset of the target label set.
- Partial DA: The target label set is a subset of the source label set.
- Universal DA (UDA): No prior knowledge of the label sets is required. The domains may share labels and have private label sets. UDA aims to identify the shared label space and align distributions accordingly.
Additionally, the paper identifies three types of domain shift:
- Prior Shift: Posterior distributions are equivalent (ps(y∣x)=pt(y∣x)), but prior class distributions differ (ps(y)=pt(y)).
- Covariate Shift: Marginal probability distributions differ (ps(x)=pt(x)), while conditional probability distributions remain constant (ps(y∣x)=pt(y∣x)).
- Concept Shift: Data distributions remain unchanged (ps(x)=pt(x)), but conditional distributions differ (ps(y∣x)=pt(y∣x)).
Domain Adaptation Approaches
The paper categorizes existing DA approaches into shallow and deep methods.
Shallow Domain Adaptation
Shallow DA techniques primarily leverage instance-based and feature-based adaptation strategies to align domain distributions. Distance metrics such as Maximum Mean Discrepancy (MMD), Wasserstein metric, Correlation Alignment (CORAL), Kullback-Leibler (KL) divergence, and Contrastive Domain Discrepancy (CDD) are commonly used to quantify and minimize the disparity between domains.
Instance-Based Adaptation
These methods address domain shift by re-weighting source domain samples based on the ratio of target and source domain densities, w(x)=pS(x)pT(x). Kernel Mean Matching (KMM) is a non-parametric method that estimates these weights by minimizing the MMD in Reproducing Kernel Hilbert Space (RKHS). Kullback-Leibler Importance Estimation Procedure (KLIEP) directly estimates the density ratio using KL-divergence between the target distribution and the weighted source distribution.
Feature-Based Adaptation
These approaches transform the original features into a new feature space to minimize the domain gap, while preserving the underlying structure of the original data. Subspace-based methods construct a common intermediate representation shared between domains. Sampling Geodesic Flow (SGF) constructs a geodesic path between source and target points on a Grassmann manifold and projects data onto sampled subspaces along this path. Subspace Alignment (SA) learns a linear mapping to project the source point directly onto the target point in the Grassmann manifold. Transformation-based methods transform features to minimize discrepancies, often using metrics like MMD or KL-divergence. Transfer Component Analysis (TCA) learns a domain-invariant feature transformation by minimizing the MMD between marginal distributions in RKHS. Joint Domain Adaptation (JDA) extends TCA by simultaneously matching both marginal and conditional distributions. Reconstruction-based methods reduce domain disparity through sample reconstruction in an intermediate feature representation. Robust Visual Domain Adaptation with Low-Rank Reconstruction (RDALR) learns a linear projection to transform source samples into a space where they can be linearly represented by target domain samples.
Deep Domain Adaptation
Deep DA techniques leverage neural networks to minimize the domain gap, often employing convolutional, autoencoder, or adversarial-based architectures. These techniques are categorized into discrepancy-based, reconstruction-based, and adversarial-based adaptation.
Discrepancy-Based Adaptation
Deep Adaptation Network (DAN) employs deep neural networks and Multiple Kernel MMD (MK-MMD) to align marginal distributions across domains. Deep Transfer Network (DTN) matches both marginal and conditional distributions using shared feature extraction layers and discriminative layers with classifier transduction.
Reconstruction-Based Adaptation
These methods use autoencoders to align domains by minimizing reconstruction error and learning invariant representations. Stacked Auto Encoders (SDA) extract high-level features from all available domains in an unsupervised manner. Deep Reconstruction-Classification Network (DRCN) uses an encoder-decoder network, with the encoder predicting source labels and the decoder reconstructing target samples.
Adversarial-Based Adaptation
These approaches minimize the distribution discrepancy between domains to obtain transferable, domain-invariant features. They draw inspiration from Generative Adversarial Networks (GANs). Gradient Reversal Layer (GRL) reverses gradients during backpropagation to learn domain-invariant features. Multi-Adversarial Domain Adaptation (MADA) uses multiple class-wise domain discriminators to enable fine-grained alignment of class distributions. Visual adversarial domain adaptation utilizes GANs at the pixel level (PixelDA, SimGAN), feature level (DAN), or both (CyCADA).
Conclusion
The paper presents a useful overview of the field of domain adaptation. It covers the motivations, categorizations, and common techniques used in domain adaptation. It is limited in scope by its focus on unsupervised domain adaptation. Furthermore, the overview of existing techniques is necessarily brief, and the reader is referred to the cited papers for more detailed information.