Can Go AIs be adversarially robust? (2406.12843v3)

Published 18 Jun 2024 in cs.LG, cs.AI, and stat.ML

Abstract: Prior work found that superhuman Go AIs can be defeated by simple adversarial strategies, especially "cyclic" attacks. In this paper, we study whether adding natural countermeasures can achieve robustness in Go, a favorable domain for robustness since it benefits from incredible average-case capability and a narrow, innately adversarial setting. We test three defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that though some of these defenses protect against previously discovered attacks, none withstand freshly trained adversaries. Furthermore, most of the reliably effective attacks these adversaries discover are different realizations of the same overall class of cyclic attacks. Our results suggest that building robust AI systems is challenging even with extremely superhuman systems in some of the most tractable settings, and highlight two key gaps: efficient generalization of defenses, and diversity in training. For interactive examples of attacks and a link to our codebase, see https://goattack.far.ai.

Citations (2)

View on Semantic Scholar

Summary

The paper evaluates various defense strategies, including positional and iterated adversarial training, to address vulnerabilities in superhuman Go AIs.
The paper finds that evolved cyclic attacks can defeat the defenses even under high computational effort, revealing persistent gaps in worst-case performance.
The paper demonstrates that transitioning to vision transformers does not significantly improve robustness, emphasizing the need for innovative training approaches.

Evaluation of Go AIs and Their Adversarial Robustness

The paper "Can Go AIs Be Adversarially Robust?" by Tseng et al. investigates the resilience of superhuman Go AI against adversarial attacks. The motivation stems from the observation that, although these AI systems surpass human performance in average-case metrics, their worst-case performance, especially under adversarial conditions, remains a critical challenge. This work is particularly pertinent given the potential deployment of AI in high-stakes, real-world applications where robustness against worst-case scenarios is paramount.

Key Findings and Methodologies

The authors assess three primary defense strategies against adversarial attacks: positional adversarial training using manually constructed positions, iterated adversarial training emphasizing repeated rounds of attack and defense, and alternative neural network architectures, specifically employing vision transformers (ViTs) in lieu of conventional convolutional neural networks (CNNs).

Positional Adversarial Training: The method extends KataGo’s training with adversarial examples based on prior strategies. Nevertheless, new adversaries trained through fine-tuning previously known strategies such as the cyclic attack could still defeat the defended models. Notably, these cyclic attacks proved adaptable enough to find variations that exploit the AI with considerable success, even at high computation budgets for victim agents.
Iterated Adversarial Training: Here, the adversarial training aimed to simulate an arms race, with adversaries continuously evolving strategies to exploit AI defenses. The findings revealed that the developed defenses were highly specific, demonstrating robust performance against observed attacks but remaining vulnerable to new, unforeseen attacks. The high sample complexity and computational overhead of training robust adversaries emerged as significant bottlenecks.
Vision Transformer Architecture: The transition from CNNs to ViTs intended to test if known limitations in CNN's inductive biases contributed to vulnerabilities. However, ViTs, despite offering a distinct architectural approach, did not significantly improve adversarial robustness. They remained susceptible to adversarial strategies similar to those affecting CNN-based Go AIs, such as cyclic attacks.

Numerical Analysis and Challenges

Several compelling numerical results emerged. Notably, specific adversaries such as the "big-adversary" and "continuous-adversary" demonstrated substantial win rates over even defended systems with minimal computational investment relative to the victim’s training. This underlines the persistent difficulty in translating adversarial training successes into comprehensive robustness for Go AIs.

Additionally, the authors reveal that even defenses which marginally increase the computational cost for adversarial training do not obviate the fundamental vulnerabilities inherent in the learning algorithms. This suggests an intrinsic challenge in the AI training process rather than straightforward architectural faults.

Implications and Future Directions

The implications of this work are notably twofold. Practically, it highlights the considerable obstacles still facing AI robustness, even within the confined and well-balanced domain of Go. Theoretically, it underscores the potential divergence between methodologies that yield robust systems versus those optimizing for superior average-case capabilities. There is a significant demand for algorithmic innovations that facilitate more generalized defenses, especially given the growing integration of AI in societal infrastructures.

Future efforts should explore training paradigm innovations, such as more dynamic adversarial training approaches or fundamentally novel architectures that perhaps avoid local equilibria better than current methods. Moreover, this work could inspire rigorous robustness assessments across different AI domains, notably where service-critical AI deployment is anticipated.

In conclusion, while the tried defenses marginally increase difficulty for adversarial attacks against Go AIs, achieving full robustness, free from exploitable weaknesses, remains an elusive goal. This underscores an imperative for ongoing research into adversarial strategies, scalable robust training, and architectural novelties to inform fundamentally robust AI design.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ai_in_check/status/1889734689360933050

https://twitter.com/ai_in_check/status/1882177046887186462

https://twitter.com/farairesearch/status/1806348039004602516

https://twitter.com/AndrewW66619812/status/1810724726991638597

https://twitter.com/firoozye/status/1810897307618951521

https://twitter.com/DatTran81322671/status/1830723552820986252