BUZz: BUffer Zones for defending adversarial examples in image classification (1910.02785v2)
Abstract: We propose a novel defense against all existing gradient based adversarial attacks on deep neural networks for image classification problems. Our defense is based on a combination of deep neural networks and simple image transformations. While straightforward in implementation, this defense yields a unique security property which we term buffer zones. We argue that our defense based on buffer zones offers significant improvements over state-of-the-art defenses. We are able to achieve this improvement even when the adversary has access to the {\em entire} original training data set and unlimited query access to the defense. We verify our claim through experimentation using Fashion-MNIST and CIFAR-10: We demonstrate $<11\%$ attack success rate -- significantly lower than what other well-known state-of-the-art defenses offer -- at only a price of a $11-18\%$ drop in clean accuracy. By using a new intuitive metric, we explain why this trade-off offers a significant improvement over prior work.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.