- The paper proposes a novel imperceptible, multi-channel backdoor attack for DNNs using DCT steganography to embed triggers in RGB channels for both N-to-N and N-to-One configurations.
- It demonstrates high success rates (over 90% on CIFAR-10 and TinyImageNet) with minimal impact on clean data accuracy, confirming its robust performance.
- The attack remains largely undetectable by defenses like Neural Cleanse, emphasizing the urgent need for improved security mechanisms in DNNs.
Imperceptible and Multi-channel Backdoor Attack against Deep Neural Networks
Introduction
Backdoor attacks on Deep Neural Networks (DNNs) represent a significant vulnerability in machine learning models, where malicious behavior is triggered by specific inputs. This paper introduces an innovative and stealthy backdoor attack method that utilizes Discrete Cosine Transform (DCT) steganography to embed imperceptible triggers across multiple channels of colored images. This technique facilitates both N-to-N and N-to-One backdoor attacks, standing out from conventional models largely focused on visible, single-trigger attacks.
Methodology
The method proposed exploits the frequency domain via DCT steganography to embed triggers into RGB channels of images without raising visual suspicion. Such triggers are imperceptible to human observers, enhancing the stealth of these attacks. Two attack variants are conceptualized: the N-to-N attack, enabling each channel-specific trigger to activate different backdoor targets, and the N-to-One attack, requiring triggers across multiple channels to collectively activate a backdoor, thus increasing the stealth and complexity of detection.
Figure 1: Overview of the proposed backdoor attack method.
Experimental Results
Extensive experiments on CIFAR-10 and TinyImageNet datasets validate the efficacy of the proposed attacks. For the N-to-N attacks, the average success rates recorded were 93.04% on CIFAR-10 and 91.55% on TinyImageNet, with the attack achieving 100% success in some cases, demonstrating its robustness across various RGB channels. For the N-to-One attacks, success rates of 90.22% and 89.53% were noted on CIFAR-10 and TinyImageNet, respectively.
Figure 2: Examples of backdoor instances for the proposed imperceptible N-to-N attack.
Figure 3: Examples of backdoor instances for the proposed imperceptible N-to-One attack.
The proposed attacks also showed minimal impact on the models' performance on clean datasets, with a negligible drop in classification accuracy, which underscores the attack's effectiveness without compromising the model's general performance.
Robustness Against Detection
The proposed backdoor attacks were tested against Neural Cleanse, a state-of-the-art defense mechanism. It was observed that Neural Cleanse failed to detect the backdoors embedded via imperceptible triggers in the majority of cases, particularly under the N-to-One attack configuration. Though partially effective on N-to-N attacks, the reverse-engineered triggers did not match the actual triggers used, exhibiting the attack's resilience and adaptation against existing defenses.
Figure 4: Triggers reversed by NC and true triggers for the proposed N-to-N attack on TinImageNet dataset.
Conclusion and Implications
The research presented marks the first attempt at an imperceptible, multi-channel backdoor attack leveraging DCT steganography, boasting high success rates and being largely undetectable by current state-of-the-art defenses. These findings pose significant challenges for DNN security paradigms, implicating the necessity for advancements in detection mechanisms and better formulation of robust defenses against such multi-channel, imperceptible threats. Future research should focus on enhancing defense mechanisms to detect and mitigate such complex backdoor attacks effectively.