Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 45 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 206 tok/s Pro
GPT OSS 120B 457 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Towards model-free RL algorithms that scale well with unstructured data (2311.02215v1)

Published 3 Nov 2023 in cs.LG and cs.AI

Abstract: Conventional reinforcement learning (RL) algorithms exhibit broad generality in their theoretical formulation and high performance on several challenging domains when combined with powerful function approximation. However, developing RL algorithms that perform well across problems with unstructured observations at scale remains challenging because most function approximation methods rely on externally provisioned knowledge about the structure of the input for good performance (e.g. convolutional networks, graph neural networks, tile-coding). A common practice in RL is to evaluate algorithms on a single problem, or on problems with limited variation in the observation scale. RL practitioners lack a systematic way to study how well a single RL algorithm performs when instantiated across a range of problem scales, and they lack function approximation techniques that scale well with unstructured observations. We address these limitations by providing environments and algorithms to study scaling for unstructured observation vectors and flat action spaces. We introduce a family of combinatorial RL problems with an exponentially large state space and high-dimensional dynamics but where linear computation is sufficient to learn a (nonlinear) value function estimate for performant control. We provide an algorithm that constructs reward-relevant general value function (GVF) questions to find and exploit predictive structure directly from the experience stream. In an empirical evaluation of the approach on synthetic problems, we observe a sample complexity that scales linearly with the observation size. The proposed algorithm reliably outperforms a conventional deep RL algorithm on these scaling problems, and they exhibit several desirable auxiliary properties. These results suggest new algorithmic mechanisms by which algorithms can learn at scale from unstructured data.

Citations (1)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.