Robust Reinforcement Learning on State Observations with Learned Optimal Adversary (2101.08452v1)

Published 21 Jan 2021 in cs.LG, cs.AI, and stat.ML

Abstract: We study the robustness of reinforcement learning (RL) with adversarially perturbed state observations, which aligns with the setting of many adversarial attacks to deep reinforcement learning (DRL) and is also important for rolling out real-world RL agent under unpredictable sensing noise. With a fixed agent policy, we demonstrate that an optimal adversary to perturb state observations can be found, which is guaranteed to obtain the worst case agent reward. For DRL settings, this leads to a novel empirical adversarial attack to RL agents via a learned adversary that is much stronger than previous ones. To enhance the robustness of an agent, we propose a framework of alternating training with learned adversaries (ATLA), which trains an adversary online together with the agent using policy gradient following the optimal adversarial attack framework. Additionally, inspired by the analysis of state-adversarial Markov decision process (SA-MDP), we show that past states and actions (history) can be useful for learning a robust agent, and we empirically find a LSTM based policy can be more robust under adversaries. Empirical evaluations on a few continuous control environments show that ATLA achieves state-of-the-art performance under strong adversaries. Our code is available at https://github.com/huanzhang12/ATLA_robust_RL.

Citations (150)

View on Semantic Scholar

Summary

The paper proposes a novel framework, SA-MDP, and a method to derive an optimal state observation adversary for deep reinforcement learning.
A new training paradigm, Alternating Training with Learned Adversaries (ATLA), iteratively optimizes the agent and adversary to enhance policy robustness.
Empirical results show ATLA, particularly with LSTM policies, significantly improves agent resilience against adversarial state perturbations in continuous control tasks.

Robust Reinforcement Learning with Learned Optimal Adversary

In the paper "Robust Reinforcement Learning on State Observations with Learned Optimal Adversary," the authors explicitly address the vulnerability of deep reinforcement learning (DRL) agents to adversarial perturbations in state observations. They offer a pioneering approach to enhance the robustness of these agents, which is paramount in real-world applications characterized by unpredictable noise.

Problem Context and Contributions

The paper hinges on the state-adversarial Markov decision process (SA-MDP) framework, distinguishing it from more traditional methods, such as robust Markov decision processes (RMDPs), by focusing on adversarial attack mechanisms applied to state observations rather than transition probabilities. In scenarios where sensor data is perturbed, the agent's decision-making process becomes compromised, catalyzing efforts to fortify the agents against such disturbances.

Two pivotal contributions underscore the research:

Optimal Adversary Construction: The researchers delineate a method to derive an optimal adversary for a given fixed policy. By reformulating the adversary learning problem as a standard MDP, they demonstrate a way to assess and implement adversarial policies that degrade agent performance significantly compared to prior attack strategies.
Alternating Training with Learned Adversaries (ATLA): A novel training paradigm, ATLA iteratively trains the agent alongside a dynamically learned adversary. By alternating between optimizing the agent's policy and the adversarial strategy, ATLA seeks an equilibrium where the agent develops intrinsic policy robustness against potent adversaries.

Additionally, the researchers extend these insights using recurrent networks (LSTMs), positing that such architectures inherently offer superior robustness due to their ability to leverage state histories—transforming the robustness challenge into a POMDP formulation.

Numerical Strength of Results

The empirical evidence presented showcases remarkable improvements when employing ATLA, especially with LSTM-based policies. For instance, in continuous control environments like MuJoCo, the ATLA framework significantly outperforms baseline methods not only in robustness but also in retaining performance when no adversarial perturbations are present. Using novel adversarial attacks, the ATLA-trained agents demonstrated resilience where traditional robust policies suffered notable performance degradation.

Theoretical and Practical Implications

Theoretically, the paper extends the frontier of robust DRL by adopting a dynamic adversarial training approach, explicitly considering the worst-case scenarios for policy evaluation and strengthening. It effectively balances the trade-off between maintaining competitive natural performance and achieving adversarial robustness.

Practically, this research paves a crucial pathway for deploying DRL in sensitive applications like autonomous vehicles, where adversarial state perturbations—whether malicious or accidental—pose substantial risks. The ATLA methodology offers a robust training protocol that healthily confronts these challenges, ensuring that agents remain operationally competent in both nominal and adversarial conditions.

Future Developments

While the current paper provides a robust framework for state observation adversaries, future research might explore its integration with other adversarial domains, such as action perturbations, or further automating hyperparameter choices in the adversarial attack learning process. Moreover, extending this framework to address collaborative multi-agent environments and adopting hierarchical adversarial strategies could spur additional advancements in robust DRL.

In summary, this paper marks a valuable step in enhancing the real-world applicability of DRL systems by addressing adversarial weaknesses with a theoretically-grounded and empirically-validated approach.

Related Papers

GitHub

GitHub - huanzhang12/ATLA_robust_RL: Robust Reinforcement Learning with the Alternating Training of Learned Adversaries (ATLA) framework (67 stars)