A Brief Review of Domain Adaptation (2010.03978v1)

Published 7 Oct 2020 in cs.LG and cs.CV

Abstract: Classical machine learning assumes that the training and test sets come from the same distributions. Therefore, a model learned from the labeled training data is expected to perform well on the test data. However, This assumption may not always hold in real-world applications where the training and the test data fall from different distributions, due to many factors, e.g., collecting the training and test sets from different sources, or having an out-dated training set due to the change of data over time. In this case, there would be a discrepancy across domain distributions, and naively applying the trained model on the new dataset may cause degradation in the performance. Domain adaptation is a sub-field within machine learning that aims to cope with these types of problems by aligning the disparity between domains such that the trained model can be generalized into the domain of interest. This paper focuses on unsupervised domain adaptation, where the labels are only available in the source domain. It addresses the categorization of domain adaptation from different viewpoints. Besides, It presents some successful shallow and deep domain adaptation approaches that aim to deal with domain adaptation problems.

Citations (438)

View on Semantic Scholar

Summary

The paper presents unsupervised domain adaptation methods focused on aligning source and target data distributions.
It categorizes approaches into shallow and deep strategies, utilizing distance metrics and adversarial techniques to mitigate domain shifts.
The review identifies limitations such as its narrow focus on unsupervised adaptation, suggesting the need for further detailed studies.

Domain Adaptation: A Review

This paper provides a concise overview of domain adaptation (DA), a subfield of machine learning focused on addressing the challenges posed when training and test data originate from different distributions. The paper emphasizes unsupervised domain adaptation (UDA), where labels are available only in the source domain.

Background and Motivation

Traditional machine learning assumes that training and test data are drawn from identical distributions. This assumption is often violated in real-world scenarios due to factors such as evolving data characteristics or disparate data sources. DA techniques aim to mitigate the performance degradation that occurs when models trained on one distribution (the source domain) are applied to a different distribution (the target domain). DA is presented as a specific instance of transfer learning, where the primary distinction lies in the scope of change between source and target: transfer learning addresses changes in both tasks and domains, while DA focuses solely on domain shifts.

Domain Adaptation Taxonomy

The paper categorizes DA approaches based on the relationship between source and target label spaces, identifying four main categories:

Closed Set DA: Source and target domains share the same classes, but their probability distributions differ.
Open Set DA: The domains share some labels, but each also has private labels. In modified open set DA, the source label set is a subset of the target label set.
Partial DA: The target label set is a subset of the source label set.
Universal DA (UDA): No prior knowledge of the label sets is required. The domains may share labels and have private label sets. UDA aims to identify the shared label space and align distributions accordingly.

Additionally, the paper identifies three types of domain shift:

Prior Shift: Posterior distributions are equivalent ( $p_s(y|x) = p_t(y|x)$ ), but prior class distributions differ ( $p_s(y) \neq p_t(y)$ ).
Covariate Shift: Marginal probability distributions differ ( $p_s(x) \neq p_t(x)$ ), while conditional probability distributions remain constant ( $p_s(y|x) = p_t(y|x)$ ).
Concept Shift: Data distributions remain unchanged ( $p_s(x) = p_t(x)$ ), but conditional distributions differ ( $p_s(y|x) \neq p_t(y|x)$ ).

Domain Adaptation Approaches

The paper categorizes existing DA approaches into shallow and deep methods.

Shallow Domain Adaptation

Shallow DA techniques primarily leverage instance-based and feature-based adaptation strategies to align domain distributions. Distance metrics such as Maximum Mean Discrepancy (MMD), Wasserstein metric, Correlation Alignment (CORAL), Kullback-Leibler (KL) divergence, and Contrastive Domain Discrepancy (CDD) are commonly used to quantify and minimize the disparity between domains.

Instance-Based Adaptation

These methods address domain shift by re-weighting source domain samples based on the ratio of target and source domain densities, $w(x) = \frac{p_T(x)}{p_S(x)}$ . Kernel Mean Matching (KMM) is a non-parametric method that estimates these weights by minimizing the MMD in Reproducing Kernel Hilbert Space (RKHS). Kullback-Leibler Importance Estimation Procedure (KLIEP) directly estimates the density ratio using KL-divergence between the target distribution and the weighted source distribution.

Feature-Based Adaptation

These approaches transform the original features into a new feature space to minimize the domain gap, while preserving the underlying structure of the original data. Subspace-based methods construct a common intermediate representation shared between domains. Sampling Geodesic Flow (SGF) constructs a geodesic path between source and target points on a Grassmann manifold and projects data onto sampled subspaces along this path. Subspace Alignment (SA) learns a linear mapping to project the source point directly onto the target point in the Grassmann manifold. Transformation-based methods transform features to minimize discrepancies, often using metrics like MMD or KL-divergence. Transfer Component Analysis (TCA) learns a domain-invariant feature transformation by minimizing the MMD between marginal distributions in RKHS. Joint Domain Adaptation (JDA) extends TCA by simultaneously matching both marginal and conditional distributions. Reconstruction-based methods reduce domain disparity through sample reconstruction in an intermediate feature representation. Robust Visual Domain Adaptation with Low-Rank Reconstruction (RDALR) learns a linear projection to transform source samples into a space where they can be linearly represented by target domain samples.

Deep Domain Adaptation

Deep DA techniques leverage neural networks to minimize the domain gap, often employing convolutional, autoencoder, or adversarial-based architectures. These techniques are categorized into discrepancy-based, reconstruction-based, and adversarial-based adaptation.

Discrepancy-Based Adaptation

Deep Adaptation Network (DAN) employs deep neural networks and Multiple Kernel MMD (MK-MMD) to align marginal distributions across domains. Deep Transfer Network (DTN) matches both marginal and conditional distributions using shared feature extraction layers and discriminative layers with classifier transduction.

Reconstruction-Based Adaptation

These methods use autoencoders to align domains by minimizing reconstruction error and learning invariant representations. Stacked Auto Encoders (SDA) extract high-level features from all available domains in an unsupervised manner. Deep Reconstruction-Classification Network (DRCN) uses an encoder-decoder network, with the encoder predicting source labels and the decoder reconstructing target samples.

Adversarial-Based Adaptation

These approaches minimize the distribution discrepancy between domains to obtain transferable, domain-invariant features. They draw inspiration from Generative Adversarial Networks (GANs). Gradient Reversal Layer (GRL) reverses gradients during backpropagation to learn domain-invariant features. Multi-Adversarial Domain Adaptation (MADA) uses multiple class-wise domain discriminators to enable fine-grained alignment of class distributions. Visual adversarial domain adaptation utilizes GANs at the pixel level (PixelDA, SimGAN), feature level (DAN), or both (CyCADA).

Conclusion

The paper presents a useful overview of the field of domain adaptation. It covers the motivations, categorizations, and common techniques used in domain adaptation. It is limited in scope by its focus on unsupervised domain adaptation. Furthermore, the overview of existing techniques is necessarily brief, and the reader is referred to the cited papers for more detailed information.