Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 71 tok/s Pro
Kimi K2 208 tok/s Pro
GPT OSS 120B 426 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

DCE: Offline Reinforcement Learning With Double Conservative Estimates (2209.13132v1)

Published 27 Sep 2022 in cs.LG

Abstract: Offline Reinforcement Learning has attracted much interest in solving the application challenge for traditional reinforcement learning. Offline reinforcement learning uses previously-collected datasets to train agents without any interaction. For addressing the overestimation of OOD (out-of-distribution) actions, conservative estimates give a low value for all inputs. Previous conservative estimation methods are usually difficult to avoid the impact of OOD actions on Q-value estimates. In addition, these algorithms usually need to lose some computational efficiency to achieve the purpose of conservative estimation. In this paper, we propose a simple conservative estimation method, double conservative estimates (DCE), which use two conservative estimation method to constraint policy. Our algorithm introduces V-function to avoid the error of in-distribution action while implicit achieving conservative estimation. In addition, our algorithm uses a controllable penalty term changing the degree of conservatism in training. We theoretically show how this method influences the estimation of OOD actions and in-distribution actions. Our experiment separately shows that two conservative estimation methods impact the estimation of all state-action. DCE demonstrates the state-of-the-art performance on D4RL.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.