Emergent Mind

Adversarial examples in the physical world

(1607.02533)
Published Jul 8, 2016 in cs.CV , cs.CR , cs.LG , and stat.ML

Abstract

Most existing machine learning classifiers are highly vulnerable to adversarial examples. An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it. In many cases, these modifications can be so subtle that a human observer does not even notice the modification at all, yet the classifier still makes a mistake. Adversarial examples pose security concerns because they could be used to perform an attack on machine learning systems, even if the adversary has no access to the underlying model. Up to now, all previous work have assumed a threat model in which the adversary can feed data directly into the machine learning classifier. This is not always the case for systems operating in the physical world, for example those which are using signals from cameras and other sensors as an input. This paper shows that even in such physical world scenarios, machine learning systems are vulnerable to adversarial examples. We demonstrate this by feeding adversarial images obtained from cell-phone camera to an ImageNet Inception classifier and measuring the classification accuracy of the system. We find that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera.

Overview

  • Machine learning classifiers are vulnerable to adversarial examples, which are subtly modified inputs designed to cause misclassification.

  • Adversarial examples can deceive classifiers even when taken from the physical world, not just in digital environments.

  • The 'fast' adversarial method is more robust against physical world transformations compared to 'iterative' methods.

  • Artificial transformations such as blurring and changing brightness do not necessarily neutralize adversarial properties.

  • The research urges for the development of defenses against adversarial attacks in real-world applications of machine learning.

Introduction to Adversarial Examples

Machine learning classifiers, despite their progress and utility, remain highly susceptible to adversarial examples. These adversaries are inputs deliberately crafted with minor modifications that cause misclassification by the learning model. While these alterations can be imperceptible to humans, they can significantly mislead classifiers. Concerns regarding the security implications of such adversarial examples have risen, especially because they can be crafted without needing detailed knowledge of the target model.

Adversarial Threats in Physical Settings

Traditionally, the perception has been that adversarial threats exist within the digital realm, where direct input into classifiers is presumed. However, real-world applications often involve systems processing inputs from the physical environment, like cameras or sensors. This paper presents evidence that adversarial examples maintain their deceptive properties even when captured through a camera, demonstrating their viability in the physical world. The researchers illustrate this by taking images modified with adversarial perturbations, printing and then photographing them through a cell phone camera, and finally feeding them to a pre-trained classifier network, where a large fraction of these images remained misclassified.

Experimental Insights

The study's experiments reveal interesting insights. The so-called 'fast' adversarial method proved more resilient to physical transfers like photographing than the 'iterative' methods, likely owing to the latter's reliance on subtler perturbations. Contrary to predictions, adding noise, blurring, and quality degradation did not guarantee destruction of the adversarial properties. Moreover, various artificial transformations like contrast and brightness adjustments minimally impacted the adversarial effectiveness. The researchers also demonstrated the feasibility of black-box adversarial attacks in the physical world using a mobile phone app, suggesting real-world applications could be at risk.

Concluding Observations

The findings underscore the potential threat adversarial examples pose to machine learning systems within physical domains, challenging the prevailing understanding that these attacks are purely digital phenomena. The study indicates that an attacker could generate numerous adversarial examples that would be wrongly classified despite undergoing real-world transformations. Ultimately, this work prompts an urgency to develop robust defenses against such vulnerabilities in machine learning systems that operate amidst the complexities of physical spaces.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.