Adversarial examples in the physical world (1607.02533v4)

Published 8 Jul 2016 in cs.CV, cs.CR, cs.LG, and stat.ML

Abstract: Most existing machine learning classifiers are highly vulnerable to adversarial examples. An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it. In many cases, these modifications can be so subtle that a human observer does not even notice the modification at all, yet the classifier still makes a mistake. Adversarial examples pose security concerns because they could be used to perform an attack on machine learning systems, even if the adversary has no access to the underlying model. Up to now, all previous work have assumed a threat model in which the adversary can feed data directly into the machine learning classifier. This is not always the case for systems operating in the physical world, for example those which are using signals from cameras and other sensors as an input. This paper shows that even in such physical world scenarios, machine learning systems are vulnerable to adversarial examples. We demonstrate this by feeding adversarial images obtained from cell-phone camera to an ImageNet Inception classifier and measuring the classification accuracy of the system. We find that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera.

Citations (5,545)

View on Semantic Scholar

Summary

The paper demonstrates that adversarial examples can mislead classifiers in the physical world via printed and photographed images.
Experimental results reveal that fast adversarial methods are more robust to physical transformations than iterative methods.
The findings highlight an urgent need to develop defenses against real-world adversarial attacks on machine learning systems.

Introduction to Adversarial Examples

Machine learning classifiers, despite their progress and utility, remain highly susceptible to adversarial examples. These adversaries are inputs deliberately crafted with minor modifications that cause misclassification by the learning model. While these alterations can be imperceptible to humans, they can significantly mislead classifiers. Concerns regarding the security implications of such adversarial examples have risen, especially because they can be crafted without needing detailed knowledge of the target model.

Adversarial Threats in Physical Settings

Traditionally, the perception has been that adversarial threats exist within the digital field, where direct input into classifiers is presumed. However, real-world applications often involve systems processing inputs from the physical environment, like cameras or sensors. This paper presents evidence that adversarial examples maintain their deceptive properties even when captured through a camera, demonstrating their viability in the physical world. The researchers illustrate this by taking images modified with adversarial perturbations, printing and then photographing them through a cell phone camera, and finally feeding them to a pre-trained classifier network, where a large fraction of these images remained misclassified.

Experimental Insights

The paper's experiments reveal interesting insights. The so-called 'fast' adversarial method proved more resilient to physical transfers like photographing than the 'iterative' methods, likely owing to the latter's reliance on subtler perturbations. Contrary to predictions, adding noise, blurring, and quality degradation did not guarantee destruction of the adversarial properties. Moreover, various artificial transformations like contrast and brightness adjustments minimally impacted the adversarial effectiveness. The researchers also demonstrated the feasibility of black-box adversarial attacks in the physical world using a mobile phone app, suggesting real-world applications could be at risk.

Concluding Observations

The findings underscore the potential threat adversarial examples pose to machine learning systems within physical domains, challenging the prevailing understanding that these attacks are purely digital phenomena. The paper indicates that an attacker could generate numerous adversarial examples that would be wrongly classified despite undergoing real-world transformations. Ultimately, this work prompts an urgency to develop robust defenses against such vulnerabilities in machine learning systems that operate amidst the complexities of physical spaces.

PDF Markdown

Related Papers

YouTube

Show All Videos