k-Mixup Regularization for Deep Learning via Optimal Transport (2106.02933v2)

Published 5 Jun 2021 in cs.LG

Abstract: Mixup is a popular regularization technique for training deep neural networks that improves generalization and increases robustness to certain distribution shifts. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To better leverage the structure of the data, we extend mixup in a simple, broadly applicable way to \emph{$k$-mixup}, which perturbs $k$-batches of training points in the direction of other $k$-batches. The perturbation is done with displacement interpolation, i.e. interpolation under the Wasserstein metric. We demonstrate theoretically and in simulations that $k$-mixup preserves cluster and manifold structures, and we extend theory studying the efficacy of standard mixup to the $k$-mixup case. Our empirical results show that training with $k$-mixup further improves generalization and robustness across several network architectures and benchmark datasets of differing modalities. For the wide variety of real datasets considered, the performance gains of $k$-mixup over standard mixup are similar to or larger than the gains of mixup itself over standard ERM after hyperparameter optimization. In several instances, in fact, $k$-mixup achieves gains in settings where standard mixup has negligible to zero improvement over ERM.

Authors (5)

Kristjan Greenewald (65 papers)
Anming Gu (5 papers)
Mikhail Yurochkin (68 papers)
Justin Solomon (86 papers)
Edward Chien (9 papers)

Citations (17)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

k-Mixup Regularization for Deep Learning via Optimal Transport (2106.02933v2)

Summary

Related Papers