On the Robustness of Safe Reinforcement Learning under Observational Perturbations (2205.14691v3)

Published 29 May 2022 in cs.LG, cs.AI, and cs.RO

Abstract: Safe reinforcement learning (RL) trains a policy to maximize the task reward while satisfying safety constraints. While prior works focus on the performance optimality, we find that the optimal solutions of many safe RL problems are not robust and safe against carefully designed observational perturbations. We formally analyze the unique properties of designing effective observational adversarial attackers in the safe RL setting. We show that baseline adversarial attack techniques for standard RL tasks are not always effective for safe RL and propose two new approaches - one maximizes the cost and the other maximizes the reward. One interesting and counter-intuitive finding is that the maximum reward attack is strong, as it can both induce unsafe behaviors and make the attack stealthy by maintaining the reward. We further propose a robust training framework for safe RL and evaluate it via comprehensive experiments. This paper provides a pioneer work to investigate the safety and robustness of RL under observational attacks for future safe RL studies. Code is available at: \url{https://github.com/liuzuxin/safe-rl-robustness}

Citations (32)

View on Semantic Scholar

Summary

The paper demonstrates that Safe RL policies are vulnerable to adversarial observational perturbations, challenging existing safety constraints.
It introduces novel Maximum Cost (MC) and Maximum Reward (MR) attacks grounded in theoretical proofs and extensive empirical validations.
The proposed adversarial training framework enhances resilience in safety-critical domains by ensuring robust policy performance amidst sensor noise.

An Examination of the Robustness of Safe Reinforcement Learning under Observational Perturbations

Reinforcement Learning (RL) has seen notable success across various domains; however, ensuring policy safety in real-world applications remains profoundly challenging. The authors of this paper present an investigation into the robustness of Safe Reinforcement Learning (Safe RL) when subjected to observational perturbations. Unlike traditional RL, which primarily focuses on action optimization, Safe RL additionally requires adherence to pre-defined safety constraints. Observational perturbations, such as sensor noise, can significantly impact policy performance and safety assurance, thus necessitating this research direction.

Observations on Policy Vulnerabilities

The authors' primary observations highlight a crucial issue: the solutions provided by existing Safe RL methods are not robust against adversarial state-space perturbations. They establish that baseline adversarial attacks, typically evaluated in standard RL contexts, fail to fully exploit the vulnerabilities inherent in Safe RL setups. Thus, the authors introduce two novel adversarial strategies:

Maximum Cost (MC) Attack: This method optimizes the perturbations to increase the policy's cost function.
Maximum Reward (MR) Attack: Counter-intuitively designed, this method maximizes the reward function to induce policies into unsafe high-reward states that violate safety constraints stealthily.

Theoretical and Empirical Foundations

The authors provide rigorous formalism and detailed proofs to support their proposition that RL policies are exposed to significant vulnerabilities under adversarial perturbations. Key observations include:

Adversarial Vulnerability: Through a set of lemmas, they demonstrate that all policies belonging to tempting policy classes—which achieve higher rewards than the optimal safe policy while violating constraints—are not feasible.
BeLLMan Contraction: They extend BeLLMan operator properties to adversarial settings, proving that even under optimal deterministic adversaries, the BeLLMan operator retains contraction properties. This confirms that value functions can still be accurately evaluated in adversarial perturbations, laying a foundation for reliable adversarial training.
Bounded Violation: The authors establish bounds for constraint violations under adversarial training, explicitly delineating how policy smoothness (Lipschitz continuity) and perturbation magnitude determine the maximum possible violations.

Pragmatic Adversarial Training Framework

To bolster Safe RL policy robustness, the authors introduce an adversarial training framework, emphasizing:

Adversarial Training with MC and MR Attacks: Training the policy in a contaminated environment using MC and MR attacks ensures robustness against other adversaries, as seen through theoretical predictions and empirical validations.
Convergence and Optimization: The proposed adversarial training incorporates primal-dual optimization with policy-based Safe RL methods.
Adaptivity: Learning rates for perturbations are dynamically adjusted to prevent over-exploration, preserving training stability and efficacy.

Experimental Validation

Extensive experiments across various continuous control tasks validate the effectiveness of the proposed attacks and adversarial training. Key insights include:

Prior Methods’ Vulnerability: Policies trained using state-of-the-art Safe RL algorithms—like PPOL—are vulnerably exposed to the proposed adversarial attacks, witnessing significant degradation in safety performance.
Superiority of Proposed Framework: The adversarially trained policies consistently outperform existing methods across multiple metrics, including attack effectiveness and safety preservation.
Generalization: The proposed framework exhibits adaptability across different Safe RL algorithms beyond PPOL, extending the approach's applicability and relevance.

Implications and Future Directions

This research offers a pivotal exploration of the intersection between robustness and safety in RL. Practical implications:

Deployment in Safety-Critical Domains: The framework is crucial for domains like autonomous driving and robotics, where safety violations can have catastrophic consequences.
Algorithm Improvement: Current Safe RL algorithms should integrate adversarial training methods to ensure robust deployment in real-world scenarios laden with uncertainties and sensor inaccuracies.
Scaling to Complex Systems: Future research could explore scaling the adversarial training approach to more complex and high-dimensional tasks encountered in real-world applications.

In conclusion, this work not only advances the state of the art in Safe RL but also provides practical methodologies for future applications, addressing crucial challenges related to observational perturbations and policy robustness.

PDF Markdown

Related Papers

GitHub

GitHub - liuzuxin/safe-rl-robustness: Code for "On the Robustness of Safe Reinforcement Learning under Observational Perturbations" (ICLR 2023) (44 stars)