Accelerated Sim-to-Real Deep Reinforcement Learning: Learning Collision Avoidance from Human Player (2102.10711v2)

Published 21 Feb 2021 in cs.AI and cs.RO

Abstract: This paper presents a sensor-level mapless collision avoidance algorithm for use in mobile robots that map raw sensor data to linear and angular velocities and navigate in an unknown environment without a map. An efficient training strategy is proposed to allow a robot to learn from both human experience data and self-exploratory data. A game format simulation framework is designed to allow the human player to tele-operate the mobile robot to a goal and human action is also scored using the reward function. Both human player data and self-playing data are sampled using prioritized experience replay algorithm. The proposed algorithm and training strategy have been evaluated in two different experimental configurations: \textit{Environment 1}, a simulated cluttered environment, and \textit{Environment 2}, a simulated corridor environment, to investigate the performance. It was demonstrated that the proposed method achieved the same level of reward using only 16\% of the training steps required by the standard Deep Deterministic Policy Gradient (DDPG) method in Environment 1 and 20\% of that in Environment 2. In the evaluation of 20 random missions, the proposed method achieved no collision in less than 2~h and 2.5~h of training time in the two Gazebo environments respectively. The method also generated smoother trajectories than DDPG. The proposed method has also been implemented on a real robot in the real-world environment for performance evaluation. We can confirm that the trained model with the simulation software can be directly applied into the real-world scenario without further fine-tuning, further demonstrating its higher robustness than DDPG. The video and code are available: https://youtu.be/BmwxevgsdGc https://github.com/hanlinniu/turtlebot3_ddpg_collision_avoidance

Citations (28)

View on Semantic Scholar

Summary

The paper introduces an innovative training strategy that fuses human teleoperation with prioritized experience replay to drastically cut simulation training steps.
It achieved equivalent rewards with only 16–20% of the training steps and zero collisions in extensive tests across cluttered and corridor environments.
Real-world deployment on a TurtleBot3 validated the model’s reliability, bridging the sim-to-real gap without needing additional fine-tuning.

Accelerated Sim-to-Real Deep Reinforcement Learning for Collision Avoidance in Mobile Robots

The paper authored by Hanlin Niu et al. presents an innovative approach to enhance the transferability of simulation-trained collision avoidance algorithms to real-world applications within autonomous mobile robotics. Specifically, the proposed method utilizes sim-to-real deep reinforcement learning (DRL) for training mobile robots in collision avoidance tasks by learning from both human experiences and self-exploratory data in a simulated environment.

Methodological Foundation

The cornerstone of this research is an efficient training strategy that combines human tele-operation and prioritizes experience replay to significantly reduce the training steps required when compared to conventional methods such as Deep Deterministic Policy Gradient (DDPG). The training framework is structured in a way that allows human players to control robots within a game simulation, where their actions are recorded and scored based on a designed reward function. This integration of human experience is pivotal because it not only provides guidance during the training phase but also bridges the disparity between simulated and real-world challenges.

Experimental Setup and Results

The efficacy of the proposed approach was validated in two experimental settings: a cluttered simulated environment (Environment 1) and a corridor-like simulated environment (Environment 2). In comparison to the standard DDPG, the proposed method demonstrated exceptional improvements, achieving equivalent levels of reward using merely 16% and 20% of the training steps in Environments 1 and 2, respectively. Noteworthy is that in testing across 20 random missions, the proposed technique resulted in zero collisions within a notably reduced training period: less than two hours in Environment 1 and 2.5 hours in Environment 2.

Real-World Applicability

One of the salient contributions of this research is its demonstration that models trained through the outlined strategy in simulation can be deployed in real-world scenarios without additional fine-tuning. This marks a significant advantage in the robustness and practicality of training algorithms for mobile robots, which were successfully tested using a TurtleBot3 Waffle Pi in real settings, without encountering the notorious sim-to-real gap.

Implications and Future Directions

The implications of this paper are multifaceted, spanning practical impact on mobile robotics applications in real-world unstructured environments where rapid and reliable deployment is critical. Theoretical advancements are observable in the use of human data alongside DRL algorithms, establishing a precedent for hybrid-learning models that harness human-derived insights to augment machine learning efficiency.

Looking forward, potential future work could explore the incorporation of richer sensory data such as RGB and depth information to further enhance the situational awareness of autonomous agents. Furthermore, introducing recurrent neural network architectures, such as LSTMs, could bolster the agent's capability to maintain situational context over extended time horizons, thereby improving navigation and decision-making in complex environments. The application and adaptation of these methods to social robotics and dynamic human-robot interaction scenarios also present promising avenues for further research.

In conclusion, this paper sets a promising direction for leveraging human-guided and reinforcement learning techniques to create robust, adaptable navigation systems in mobile robotics, significantly contributing to the optimization of sim-to-real archetypes within the domain.