Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dealing with uncertainty: balancing exploration and exploitation in deep recurrent reinforcement learning (2310.08331v2)

Published 12 Oct 2023 in stat.ML and cs.LG

Abstract: Incomplete knowledge of the environment leads an agent to make decisions under uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions is: exploiting the current knowledge of the environment to maximize the cumulative reward as well as exploring actions that allow improving the knowledge of the environment, hopefully leading to higher reward values (exploration-exploitation trade-off). Concurrently, another relevant issue regards the full observability of the states, which may not be assumed in all applications. For instance, when 2D images are considered as input in an RL approach used for finding the best actions within a 3D simulation environment. In this work, we address these issues by deploying and testing several techniques to balance exploration and exploitation trade-off on partially observable systems for predicting steering wheels in autonomous driving scenarios. More precisely, the final aim is to investigate the effects of using both adaptive and deterministic exploration strategies coupled with a Deep Recurrent Q-Network. Additionally, we adapted and evaluated the impact of a modified quadratic loss function to improve the learning phase of the underlying Convolutional Recurrent Neural Network. We show that adaptive methods better approximate the trade-off between exploration and exploitation and, in general, Softmax and Max-Boltzmann strategies outperform epsilon-greedy techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. P. Auer, Using Confidence Bounds for Exploitation-Exploration Trade-offs, Journal of Machine Learning Research 3 (2002) 397–422.
  2. G. Lample, D. S. Chaplot, Playing FPS Games with Deep Reinforcement Learning, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), 2017, p. 2140–2146.
  3. M. Tokic, Adaptive ϵitalic-ϵ\epsilonitalic_ϵ-Greedy Exploration in Reinforcement Learning Based on Value Differences, in: KI 2010: Advances in Artificial Intelligence, Springer Berlin Heidelberg, 2010, pp. 203–210.
  4. M. Tokic, G. Palm, Value-Difference based Exploration: Adaptive Control between epsilon-Greedy and Softmax, in: KI 2011: Advances in Artificial Intelligence, Springer Berlin Heidelberg, 2011, pp. 335–346.
  5. Epsilon-bmc: A bayesian ensemble approach to epsilon-greedy exploration in model-free reinforcement learning, in: Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, volume 115 of Proceedings of Machine Learning Research, 2020, pp. 476–485.
  6. Reinforcement learning for energy harvesting point-to-point communications, in: 2016 IEEE International Conference on Communications (ICC), 2016, pp. 1–6.
  7. Action Selection Methods in a Robotic Reinforcement Learning Scenario, in: 2018 IEEE Latin American Conference on Computational Intelligence (LA-CCI), 2018, pp. 1–6.
  8. Towards monocular vision-based autonomous flight through deep reinforcement learning, Expert Systems with Applications 198 (2022).
  9. Learning to select goals in Automated Planning with Deep-Q Learning, Expert Systems with Applications 202 (2022).
  10. Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection, Computers 11 (2022).
  11. Human-like Autonomous Vehicle Speed Control by Deep Reinforcement Learning with Double Q-Learning, in: 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 1251–1256.
  12. Deep Q Learning Based High Level Driving Policy Determination, in: 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 226–231.
  13. Energy management based on reinforcement learning with double deep Q-learning for a hybrid electric tracked vehicle, Applied Energy 254 (2019) 113708.
  14. M. Hausknecht, P. Stone, Deep recurrent q-learning for partially observable mdps, in: 2015 aaai fall symposium series, 6, 2015.
  15. Adaptive Traffic Signal Control with Deep Recurrent Q-learning, in: 2018 IEEE Intelligent Vehicles Symposium (IV), 2018, pp. 1215–1220.
  16. Autonomous quadrotor obstacle avoidance based on dueling double deep recurrent Q-learning with monocular vision, Neurocomputing 441 (2021) 300–310.
  17. Dealing with Partial Observations in Dynamic Spectrum Access: Deep Recurrent Q-Networks, in: MILCOM 2018 - 2018 IEEE Military Communications Conference (MILCOM), 2018, pp. 865–870.
  18. Deep Reinforcement Learning on Autonomous Driving Policy With Auxiliary Critic Network, IEEE Transactions on Neural Networks and Learning Systems (2021) 1–11.
  19. MADRaS: Multi Agent Driving Simulator, Journal of Artificial Intelligence Research 70 (2021) 1517–1555.
  20. Multimodal End-to-End Autonomous Driving, IEEE Transactions on Intelligent Transportation Systems 23 (2022) 537–547.
  21. Uncertainty Quantification with Statistical Guarantees in End-to-End Autonomous Driving Control, in: 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020, pp. 7344–7350.
  22. Development of a Scenario Simulation Platform to Support Autonomous Driving Verification, in: 2019 IEEE International Conference on Connected Vehicles and Expo (ICCVE), 2019, pp. 1–7.
  23. Deep Autonomous Agents comparison for Self-Driving Cars, in: Proceedings of The 7th International Conference on Machine Learning, Optimization and Big Data - LOD, 2021, pp. 201–213.
  24. Behavioral decision-making for urban autonomous driving in the presence of pedestrians using Deep Recurrent Q-Network, in: 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV), 2020, pp. 428–433.
  25. Decision-Making Strategy on Highway for Autonomous Vehicles Using Deep Reinforcement Learning, IEEE Access 8 (2020) 177804–177814.
  26. L. Rodman, On the many-armed bandit problem, The Annals of Probability 6 (1978) 491–498.
  27. P. Whittle, Multi-armed bandits and the gittins index, Journal of the Royal Statistical Society. Series B (Methodological) 42 (1980) 143–149.
  28. S. Bubeck, N. Cesa-Bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and Trends® in Machine Learning 5 (2012) 1–122.
  29. Reinforcement learning: A survey, Journal of Artificial Intelligence Research 4 (1996) 237–285.
  30. M. Hauskrecht, Value-function approximations for partially observable markov decision processes, Journal of Artificial Intelligence Research 13 (2000) 33–94.
  31. A bayesian approach for learning and planning in partially observable markov decision processes, Journal of Machine Learning Research 12 (2011) 1729–1770.
  32. Distributed Deep Reinforcement Learning on the Cloud for Autonomous Driving, IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS) (2018) 16–22.
  33. Airsim: High-fidelity visual and physical simulation for autonomous vehicles, Field and Service Robotics (2017) 621–635.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Valentina Zangirolami (1 paper)
  2. Matteo Borrotti (3 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com