Transferable Clean-Label Poisoning Attacks on Deep Neural Nets (1905.05897v2)

Published 15 May 2019 in stat.ML, cs.CR, and cs.LG

Abstract: Clean-label poisoning attacks inject innocuous looking (and "correctly" labeled) poison images into training data, causing a model to misclassify a targeted image after being trained on this data. We consider transferable poisoning attacks that succeed without access to the victim network's outputs, architecture, or (in some cases) training data. To achieve this, we propose a new "polytope attack" in which poison images are designed to surround the targeted image in feature space. We also demonstrate that using Dropout during poison creation helps to enhance transferability of this attack. We achieve transferable attack success rates of over 50% while poisoning only 1% of the training set.

Citations (271)

View on Semantic Scholar

Summary

The paper introduces a polytope attack method that enhances the transferability of clean-label poisoning by strategically placing malicious samples in the feature space.
It leverages dropout during poison image crafting, achieving over 50% misclassification success with just 1% of the training data compromised.
The study shows the attack’s effectiveness across various training conditions, highlighting substantial security risks for real-world deep learning systems.

Overview of "Transferable Clean-Label Poisoning Attacks on Deep Neural Nets"

The paper "Transferable Clean-Label Poisoning Attacks on Deep Neural Nets" explores the development and implications of a novel class of data poisoning attacks designed for deep neural networks (DNNs). These attacks, termed "clean-label poisoning attacks," inject plausibly legitimate but maliciously intended samples into the training datasets, to cause misclassification of a specific target sample during inference, without altering the sample labels. Unlike evasion attacks that modify inputs directly used during inference, these poisoning techniques compromise the integrity of training data, while maintaining an undetectable attack profile.

Key Contributions

Polytope Attack Methodology: The authors propose a new "polytope attack" strategy designed to increase the transferability of clean-label poisoning attacks. By positioning malicious samples in the feature space such that they encapsulate the target image, a classifier trained on this data is likely to associate the target image with the class of the malicious samples. This approach shows enhanced effectiveness over existing feature collision methods, particularly in settings where the victim model's parameters and architecture are unknown.
Use of Dropout for Transferability: Dropout, a regularization technique, is utilized during the crafting of poison images. This practice helps to improve the transferability of the attack across different models, achieving success rates of over 50% when only 1% of the training dataset is compromised.
Impact Under Various Training Conditions: The paper extends its analysis to scenarios under different learning conditions such as transfer learning, where only part of the model is retrained on the poisoned dataset, and end-to-end training, where the entire network undergoes retraining. The polytope attack maintains its effectiveness across these varied learning backdrops.

Implications and Future Directions

Security Threats in Autonomous and Real-world Systems: The attack method detailed in the paper poses a substantial threat to systems operating with DNNs, such as autonomous vehicles and surveillance systems, where training data can potentially be collected through uncontrolled means like web scraping. The attack's stealth, owing to its clean-label nature, underscores the need for heightened vigilance and robustness in data procurement processes.
Enhancements in Defense Mechanisms: The development of improved defense mechanisms against such advanced poisoning techniques becomes paramount. Potential avenues might involve enhanced anomaly detection in training datasets and designing models with stronger resistance to adversarial data.
Theoretical Insights into Model Generalization: The research suggests another angle for examining how models generalize under adversarial conditions, as certain architectures are found to be more vulnerable to the attacks. This insight could drive future studies about network architecture and its influences on adversarial robustness.

In conclusion, the paper makes significant strides in advancing the understanding of clean-label data poisoning attacks and presents a compelling case for ongoing advancements in both attack methodologies and corresponding defense strategies. These insights could substantially shape ongoing research focused on securing machine-learning systems against training data vulnerabilities.