Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL (1812.00045v1)

Published 30 Nov 2018 in cs.LG, cs.AI, and cs.NE

Abstract: Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power. However, there are still several challenges to be addressed such as convergence to locally optimal policies and long training times. In this paper, firstly, we augment Asynchronous Advantage Actor-Critic (A3C) method with a novel self-supervised auxiliary task, i.e. \emph{Terminal Prediction}, measuring temporal closeness to terminal states, namely A3C-TP. Secondly, we propose a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods. Compared to vanilla A3C, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Bilal Kartal (12 papers)
  2. Pablo Hernandez-Leal (13 papers)
  3. Matthew E. Taylor (69 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.