DeLF: Designing Learning Environments with Foundation Models (2401.08936v1)
Abstract: Reinforcement learning (RL) offers a capable and intuitive structure for the fundamental sequential decision-making problem. Despite impressive breakthroughs, it can still be difficult to employ RL in practice in many simple applications. In this paper, we try to address this issue by introducing a method for designing the components of the RL environment for a given, user-intended application. We provide an initial formalization for the problem of RL component design, that concentrates on designing a good representation for observation and action space. We propose a method named DeLF: Designing Learning Environments with Foundation Models, that employs LLMs to design and codify the user's intended learning scenario. By testing our method on four different learning environments, we demonstrate that DeLF can obtain executable environment codes for the corresponding RL problems.
- Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5): 834–846.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47: 253–279.
- Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
- Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks. CoRR, abs/2306.13831.
- Coulom, R. 2002. Reinforcement learning using neural networks, with applications to motor control. Ph.D. thesis, Institut National Polytechnique de Grenoble-INPG.
- Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1): 99–134.
- Bridging RL Theory and Practice with the Effective Horizon. arXiv preprint arXiv:2304.09853.
- Leurent, E. 2018. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highway-env.
- Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
- Eureka: Human-Level Reward Design via Coding Large Language Models. arXiv preprint arXiv:2310.12931.
- PyFlyt–UAV Simulation Environments for Reinforcement Learning Research. arXiv preprint arXiv:2304.01305.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, 5026–5033. IEEE.
- Language to Rewards for Robotic Skill Synthesis. arXiv preprint arXiv:2306.08647.