Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 167 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 42 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 203 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 32 tok/s Pro
2000 character limit reached

Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient (2007.01932v2)

Published 3 Jul 2020 in cs.LG and stat.ML

Abstract: Exploration-exploitation dilemma has long been a crucial issue in reinforcement learning. In this paper, we propose a new approach to automatically balance between these two. Our method is built upon the Soft Actor-Critic (SAC) algorithm, which uses an "entropy temperature" that balances the original task reward and the policy entropy, and hence controls the trade-off between exploitation and exploration. It is empirically shown that SAC is very sensitive to this hyperparameter, and the follow-up work (SAC-v2), which uses constrained optimization for automatic adjustment, has some limitations. The core of our method, namely Meta-SAC, is to use metagradient along with a novel meta objective to automatically tune the entropy temperature in SAC. We show that Meta-SAC achieves promising performances on several of the Mujoco benchmarking tasks, and outperforms SAC-v2 over 10% in one of the most challenging tasks, humanoid-v2.

Citations (17)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.