Improving Adversarial Robustness via Promoting Ensemble Diversity (1901.08846v3)

Published 25 Jan 2019 in cs.LG and stat.ML

Abstract: Though deep neural networks have achieved significant progress on various tasks, often enhanced by model ensemble, existing high-performance models can be vulnerable to adversarial attacks. Many efforts have been devoted to enhancing the robustness of individual networks and then constructing a straightforward ensemble, e.g., by directly averaging the outputs, which ignores the interaction among networks. This paper presents a new method that explores the interaction among individual networks to improve robustness for ensemble models. Technically, we define a new notion of ensemble diversity in the adversarial setting as the diversity among non-maximal predictions of individual members, and present an adaptive diversity promoting (ADP) regularizer to encourage the diversity, which leads to globally better robustness for the ensemble by making adversarial examples difficult to transfer among individual members. Our method is computationally efficient and compatible with the defense methods acting on individual networks. Empirical results on various datasets verify that our method can improve adversarial robustness while maintaining state-of-the-art accuracy on normal examples.

Citations (417)

View on Semantic Scholar

Summary

The paper introduces an ADP regularizer that promotes ensemble diversity to significantly improve adversarial robustness.
It employs a logarithm of ensemble diversity and ensemble entropy to enforce orthogonal non-maximal predictions among models.
Empirical results on MNIST and CIFAR-10 confirm enhanced resistance against FGSM, BIM, and PGD attacks with minimal overhead.

Insights into "Improving Adversarial Robustness via Promoting Ensemble Diversity"

The paper "Improving Adversarial Robustness via Promoting Ensemble Diversity" presents a novel approach for enhancing the adversarial robustness of ensemble models in deep neural networks (DNNs). The cornerstone of this methodology is the promotion of what the authors term ensemble diversity. This concept focuses on the variations among the predictions made by individual networks within an ensemble, notably concentrating on the non-maximal predictions to achieve better adversarial defenses.

Overview of the Proposed Methodology

In the context of DNNs, individual models are often susceptible to adversarial attacks, which exploit the network's vulnerabilities by altering inputs in a barely perceptible manner to yield incorrect outputs. Traditional approaches bolster each network separately and aggregate outputs without emphasizing the intrinsic interactions among ensemble members.

The paper introduces an Adaptive Diversity Promoting (ADP) regularizer which is crucial for enhancing ensemble diversity. This regularizer consists of:

A Logarithm of Ensemble Diversity (LED) term, quantified as the determinant of normalized non-maximal prediction matrices.
An Ensemble Entropy term, representing the overall prediction diversity at the ensemble level.

The optimization objective balances these components, encouraging orthogonality among predictions of different models, thereby making it more difficult for adversarial examples targeting one model to transfer effectively across the ensemble.

Numerical Results and Claims

The empirical results across various datasets, including MNIST and CIFAR-10, demonstrate significant improvements in adversarial robustness when employing the ADP regularizer. The ensemble models maintained high accuracy rates on standard inputs, indicating that robustness improvements are not achieved at the expense of accuracy.

The paper verifies this with several adversarial attack algorithms including FGSM, BIM, and PGD, showcasing enhanced robustness compared to baseline methods without ensemble diversity promotion. The robustness is markedly pronounced in defense against attacks with transferable adversarial examples, illustrating the efficacy of the proposed approach.

Theoretical Implications

The methodology reshapes our understanding of ensemble learning in adversarial settings by introducing a quantitative measure of ensemble diversity that aligns with robust prediction performance. Unlike prior definitions based on prediction errors, this approach reconsiders diversity in terms of non-maximal predictions, maintaining ensemble accuracy while enhancing robustness.

Furthermore, the paper explores the conditions under which the ADP regularizer succeeds, with the possibility of adapting to varying ensemble sizes and classifier outputs. The strategy of encouraging diversity through non-maximal prediction orthogonality represents a new frontier in adversarial defense, potentially serving as a foundation for future explorations in ensemble-based defenses.

Practical Implications and Future Directions

From a practical standpoint, the computational demands of implementing this method are modest, making it feasible for large-scale tasks—a crucial consideration for deployment in real-world systems. The compatibility with existing adversarial defense mechanisms, such as adversarial training, highlights the versatility and practicality of this approach.

The potential avenues for further exploration include the extension of this methodology to other forms of machine learning models and tasks beyond classification. Additionally, refining the ensemble diversity metric or leveraging it for the detection of adversarial inputs presents an intriguing area for further research.

In conclusion, the introduction of ensemble diversity promotion via the ADP regularizer marks a significant step forward in the quest for robust AI systems, providing a compelling strategy for adversarial resilience in ensemble learning frameworks.

PDF Markdown