Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Offline Reinforcement learning with Heavy-Tailed Rewards (2310.18715v2)

Published 28 Oct 2023 in cs.LG, cs.AI, and stat.ML

Abstract: This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions. The implementation of the proposal is available at https://github.com/Mamba413/ROOM.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jin Zhu (35 papers)
  2. Runzhe Wan (19 papers)
  3. Zhengling Qi (37 papers)
  4. Shikai Luo (16 papers)
  5. Chengchun Shi (57 papers)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com