Papers
Topics
Authors
Recent
2000 character limit reached

Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization (1708.08689v1)

Published 29 Aug 2017 in cs.LG

Abstract: A number of online services nowadays rely upon machine learning to extract valuable information from data collected in the wild. This exposes learning algorithms to the threat of data poisoning, i.e., a coordinate attack in which a fraction of the training data is controlled by the attacker and manipulated to subvert the learning process. To date, these attacks have been devised only against a limited class of binary learning algorithms, due to the inherent complexity of the gradient-based procedure used to optimize the poisoning points (a.k.a. adversarial training examples). In this work, we rst extend the de nition of poisoning attacks to multiclass problems. We then propose a novel poisoning algorithm based on the idea of back-gradient optimization, i.e., to compute the gradient of interest through automatic di erentiation, while also reversing the learning procedure to drastically reduce the attack complexity. Compared to current poisoning strategies, our approach is able to target a wider class of learning algorithms, trained with gradient- based procedures, including neural networks and deep learning architectures. We empirically evaluate its e ectiveness on several application examples, including spam ltering, malware detection, and handwritten digit recognition. We nally show that, similarly to adversarial test examples, adversarial training examples can also be transferred across di erent learning algorithms.

Citations (598)

Summary

  • The paper introduces a novel back-gradient optimization technique for efficiently generating adversarial poisoning attacks on deep learning models.
  • It extends the threat model to multiclass systems by distinguishing between error-generic and error-specific attacks, enhancing our understanding of data poisoning vulnerabilities.
  • Experiments on tasks like spam detection and MNIST show that even a small set of adversarial examples can significantly degrade model performance and transfer across different algorithms.

"Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization"

Introduction

The paper "Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization" addresses the evolving threat of data poisoning in machine learning systems. By exploiting vulnerabilities inherent in the training data, adversaries can subvert learning processes. The research extends existing frameworks to consider multiclass systems and neural networks and introduces a novel algorithm based on back-gradient optimization for generating adversarial training examples.

Extending the Threat Model

The researchers expand the definition of data poisoning to include multiclass classification, which poses a greater challenge due to the increased complexity of potential misclassifications. They elaborate on the distinctions between error-generic and error-specific attacks, which respectively relate to increasing the general misclassification rate and targeting specific misclassification outcomes.

Back-gradient Optimization Technique

The paper proposes using back-gradient optimization to efficiently compute adversarial perturbations. Traditional methods rely on bilevel optimization, involving the solution of KKT conditions of the learning problem, which is computationally intensive. The newly suggested approach computes gradients using reverse-mode automatic differentiation, facilitating the generation of poisoning attacks that affect broader classes of algorithms, including those leveraging deep neural network architectures.

Experimental Evaluation

Figure 1

Figure 1

Figure 1: Results for PK poisoning attacks.

The effectiveness of the attack methodology was empirically validated through experiments involving spam, malware detection, and handwritten digit classification. It was demonstrated that neural networks, when poisoned accurately using back-gradient optimization, could substantially degrade in performance with a relatively small set of adversarial examples. Moreover, evidence of transferability in such attacks was provided, corroborating that adversarial samples created for one learning algorithm could impact others, underscoring the cross-model robustness of the attack. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Results for LK-SL poisoning attacks (transferability of poisoning samples) on Spambase (top row) and Ransomware (bottom row).

Application to Deep Networks

The paper also provides a proof-of-concept for executing such attacks on convolutional neural networks (CNNs) on the MNIST digit recognition task. The study finds neural networks’ decision boundaries in the input space contribute to their vulnerability to well-crafted adversarial training examples, highlighting the nuanced difficulties deep learning models face under such poisoning scenarios. Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: Error-generic (top row) and error-specific (bottom row) poisoning against multiclass LR on the MNIST data.

Conclusion

In conclusion, the study juxtaposes traditional machine learning algorithms against deep learning architectures under adversarial conditions controlled by an intelligent threat model. While traditional models displayed susceptibility, the paper recognizes deep learning models' resilience, albeit acknowledging potential vulnerabilities. Future work could bridge gaps in understanding universal perturbations for deep learning models and defense mechanisms against increasingly sophisticated poisoning strategies. This research thus pushes the envelope in securing learning systems in environments perennially exposed to adversarial manipulation.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.