Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation (2301.13087v1)

Published 30 Jan 2023 in cs.LG and stat.ML

Abstract: We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory conditions.We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback, featuring a combination of mirror-descent and least squares policy evaluation in an auxiliary MDP used to compute exploration bonuses.Our algorithm obtains an $\widetilde O(K{6/7})$ regret bound, improving significantly over previous state-of-the-art of $\widetilde O (K{14/15})$ in this setting. In addition, we present a version of the same algorithm under the assumption a simulator of the environment is available to the learner (but otherwise no exploratory assumptions are made), and prove it obtains state-of-the-art regret of $\widetilde O (K{2/3})$.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Uri Sherman (10 papers)
  2. Tomer Koren (79 papers)
  3. Yishay Mansour (158 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.