Discrete-to-Deep Supervised Policy Learning (2005.02057v1)

Published 5 May 2020 in cs.LG and stat.ML

Abstract: Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. For years, scholars have got around this by employing experience replay or an asynchronous parallel-agent system. This paper proposes Discrete-to-Deep Supervised Policy Learning (D2D-SPL) for training neural networks in RL. D2D-SPL discretises the continuous state space into discrete states and uses actor-critic to learn a policy. It then selects from each discrete state an input value and the action with the highest numerical preference as an input/target pair. Finally it uses input/target pairs from all discrete states to train a classifier. D2D-SPL uses a single agent, needs no experience replay and learns much faster than state-of-the-art methods. We test our method with two RL environments, the Cartpole and an aircraft manoeuvring simulator.

Citations (3)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Discrete-to-Deep Supervised Policy Learning (2005.02057v1)

Summary

Related Papers