Data Augmentation Can Improve Robustness (2111.05328v1)

Published 9 Nov 2021 in cs.CV, cs.LG, and stat.ML

Abstract: Adversarial training suffers from robust overfitting, a phenomenon where the robust test accuracy starts to decrease during training. In this paper, we focus on reducing robust overfitting by using common data augmentation schemes. We demonstrate that, contrary to previous findings, when combined with model weight averaging, data augmentation can significantly boost robust accuracy. Furthermore, we compare various augmentations techniques and observe that spatial composition techniques work the best for adversarial training. Finally, we evaluate our approach on CIFAR-10 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $\epsilon = 8/255$ and $\epsilon = 128/255$, respectively. We show large absolute improvements of +2.93% and +2.16% in robust accuracy compared to previous state-of-the-art methods. In particular, against $\ell_\infty$ norm-bounded perturbations of size $\epsilon = 8/255$, our model reaches 60.07% robust accuracy without using any external data. We also achieve a significant performance boost with this approach while using other architectures and datasets such as CIFAR-100, SVHN and TinyImageNet.

Citations (233)

View on Semantic Scholar

Summary

The paper demonstrates that combining data augmentation with model weight averaging effectively reduces robust overfitting and improves test-time adversarial robustness.
The study highlights that spatial composition techniques like CutMix significantly enhance robustness, achieving up to 60.07% robust accuracy on CIFAR-10 under ε∞ = 8/255.
Extensive ablation studies reveal that model weight averaging of diverse training snapshots boosts performance while maintaining computational efficiency.

Enhancing Adversarial Robustness through Data Augmentation and Model Weight Averaging

The discussed paper critically evaluates the capacity of data augmentation methodologies combined with model weight averaging for improving the adversarial robustness of neural networks. While adversarial training is a prevalent method for enhancing model robustness, it often leads to robust overfitting—a decline in test-time robustness despite improvements in training robustness. This work confronts such overfitting by exploring data augmentation strategies in tandem with model weight averaging, notably opposite to preceding research that found minimal gains from data augmentations in adversarial contexts.

Several core insights can be extracted from this research. The authors propose that adversarially trained models generally benefit from combined effects of data augmentation and model weight averaging. This approach mitigates robust overfitting—a phenomenon where robust accuracy degrades over time. Initial attempts with augmentation techniques like Cutout and MixUp showed minimal gains, which motivated further investigation into spatial composition-based augmentations. The authors demonstrate that techniques like CutMix—a method involving random swapping of image patches—displayed peak robustness improvements when paired with model weight averaging, achieving significant empirical gains.

In empirical tests on Cifar-10 with perturbations bounded by $\epsilon_\infty = 8/255$ , models incorporating CutMix achieved a remarkable robust accuracy of 60.07%, outperforming previous state-of-the-art techniques by 2.93 percentage points. This result reflects robustness improvements across various architectures and datasets, including Cifar-100, SVHN, and TinyImageNet, and generalizes well to different threat models such as $\epsilon_2$ perturbations.

The research introduces a nuanced contribution by showcasing that robust performance is bolstered when model snapshots taken at different iterations—preserving equal performance in ensemble—but varying in individual predictions, are aggregated. It is proposed that this diversity in predictions likely stems from the diversity induced by robust augmentations and, when averaged, improves the model's overall viewpoint on robust prediction.

This finding is accentuated by model ensemble studies, illustrating that robustness is effectively boosted by averaging differently trained models—a principle extended to weight averaging, which aggregates states from various points in a single model's training without extensive computational overheads.

The researchers further examine optimal configurations through comprehensive ablation studies—the spatial composition augmentations benefitting the most from large window lengths and exhibiting the most promise for adversarial settings compared to blending techniques like MixUp.

Potential theoretical impacts include providing a deeper understanding of how model structures and training strategies can contribute to robustness. Practically, this research suggests a path toward employing computationally efficient, robust-enhancing methods without necessitating external datasets. This holds significant implications for scenarios demanding significant robustness, like autonomous systems or critical predictive services, where robustness is pivotal yet computational underpinnings remain resource-constrained.

In summary, combining data augmentation strategies—especially spatial alterations like CutMix—with model weight averaging has been shown to significantly enhance adversarial robustness across various models and datasets. Future explorations may explore how specific combinations of augmentative techniques can further optimize robust model performances or assess potential applicability to other robustness-related challenges in machine learning.

PDF Markdown

Related Papers

YouTube

Show All Videos