Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 31 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 11 tok/s Pro
GPT-5 High 9 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 463 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Efficient Exploration through Bayesian Deep Q-Networks (1802.04412v4)

Published 13 Feb 2018 in cs.AI, cs.LG, and stat.ML

Abstract: We study reinforcement learning (RL) in high dimensional episodic Markov decision processes (MDP). We consider value-based RL when the optimal Q-value is a linear function of d-dimensional state-action feature representation. For instance, in deep-Q networks (DQN), the Q-value is a linear function of the feature representation layer (output layer). We propose two algorithms, one based on optimism, LINUCB, and another based on posterior sampling, LINPSRL. We guarantee frequentist and Bayesian regret upper bounds of O(d sqrt{T}) for these two algorithms, where T is the number of episodes. We extend these methods to deep RL and propose Bayesian deep Q-networks (BDQN), which uses an efficient Thompson sampling algorithm for high dimensional RL. We deploy the double DQN (DDQN) approach, and instead of learning the last layer of Q-network using linear regression, we use Bayesian linear regression, resulting in an approximated posterior over Q-function. This allows us to directly incorporate the uncertainty over the Q-function and deploy Thompson sampling on the learned posterior distribution resulting in efficient exploration/exploitation trade-off. We empirically study the behavior of BDQN on a wide range of Atari games. Since BDQN carries out more efficient exploration and exploitation, it is able to reach higher return substantially faster compared to DDQN.

Citations (156)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.