Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Online Learning for Unknown Partially Observable MDPs (2102.12661v2)

Published 25 Feb 2021 in cs.LG

Abstract: Solving Partially Observable Markov Decision Processes (POMDPs) is hard. Learning optimal controllers for POMDPs when the model is unknown is harder. Online learning of optimal controllers for unknown POMDPs, which requires efficient learning using regret-minimizing algorithms that effectively tradeoff exploration and exploitation, is even harder, and no solution exists currently. In this paper, we consider infinite-horizon average-cost POMDPs with unknown transition model, though a known observation model. We propose a natural posterior sampling-based reinforcement learning algorithm (PSRL-POMDP) and show that it achieves a regret bound of $O(\log T)$, where $T$ is the time horizon, when the parameter set is finite. In the general case (continuous parameter set), we show that the algorithm achieves $O (T{2/3})$ regret under two technical assumptions. To the best of our knowledge, this is the first online RL algorithm for POMDPs and has sub-linear regret.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mehdi Jafarnia-Jahromi (10 papers)
  2. Rahul Jain (152 papers)
  3. Ashutosh Nayyar (54 papers)
Citations (21)

Summary

We haven't generated a summary for this paper yet.