Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MULEX: Disentangling Exploitation from Exploration in Deep RL (1907.00868v1)

Published 1 Jul 2019 in cs.LG, cs.AI, and stat.ML

Abstract: An agent learning through interactions should balance its action selection process between probing the environment to discover new rewards and using the information acquired in the past to adopt useful behaviour. This trade-off is usually obtained by perturbing either the agent's actions (e.g., e-greedy or Gibbs sampling) or the agent's parameters (e.g., NoisyNet), or by modifying the reward it receives (e.g., exploration bonus, intrinsic motivation, or hand-shaped rewards). Here, we adopt a disruptive but simple and generic perspective, where we explicitly disentangle exploration and exploitation. Different losses are optimized in parallel, one of them coming from the true objective (maximizing cumulative rewards from the environment) and others being related to exploration. Every loss is used in turn to learn a policy that generates transitions, all shared in a single replay buffer. Off-policy methods are then applied to these transitions to optimize each loss. We showcase our approach on a hard-exploration environment, show its sample-efficiency and robustness, and discuss further implications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lucas Beyer (46 papers)
  2. Damien Vincent (25 papers)
  3. Olivier Teboul (12 papers)
  4. Sylvain Gelly (43 papers)
  5. Matthieu Geist (93 papers)
  6. Olivier Pietquin (90 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.