Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 173 tok/s

Gemini 2.5 Pro 54 tok/s Pro

GPT-5 Medium 36 tok/s Pro

GPT-5 High 35 tok/s Pro

GPT-4o 110 tok/s Pro

Kimi K2 221 tok/s Pro

GPT OSS 120B 444 tok/s Pro

Claude Sonnet 4.5 37 tok/s Pro

2000 character limit reached

Square-root regret bounds for continuous-time episodic Markov decision processes (2210.00832v2)

Published 3 Oct 2022 in cs.LG and math.OC

Abstract: We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-horizon episodic setting. In contrast to discrete-time MDPs, the inter-transition times of a continuous-time MDP are exponentially distributed with rate parameters depending on the state--action pair at each transition. We present a learning algorithm based on the methods of value iteration and upper confidence bound. We derive an upper bound on the worst-case expected regret for the proposed algorithm, and establish a worst-case lower bound, both bounds are of the order of square-root on the number of episodes. Finally, we conduct simulation experiments to illustrate the performance of our algorithm.

References (36)