Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation (1709.10489v3)

Published 29 Sep 2017 in cs.LG, cs.AI, and cs.RO

Abstract: Enabling robots to autonomously navigate complex environments is essential for real-world deployment. Prior methods approach this problem by having the robot maintain an internal map of the world, and then use a localization and planning method to navigate through the internal map. However, these approaches often include a variety of assumptions, are computationally intensive, and do not learn from failures. In contrast, learning-based methods improve as the robot acts in the environment, but are difficult to deploy in the real-world due to their high sample complexity. To address the need to learn complex policies with few samples, we propose a generalized computation graph that subsumes value-based model-free methods and model-based methods, with specific instantiations interpolating between model-free and model-based. We then instantiate this graph to form a navigation model that learns from raw images and is sample efficient. Our simulated car experiments explore the design decisions of our navigation model, and show our approach outperforms single-step and $N$-step double Q-learning. We also evaluate our approach on a real-world RC car and show it can learn to navigate through a complex indoor environment with a few hours of fully autonomous, self-supervised training. Videos of the experiments and code can be found at github.com/gkahn13/gcg

Citations (283)

View on Semantic Scholar

Summary

The paper introduces a novel computation graph that integrates model-free and model-based reinforcement learning to enhance navigation policies.
It demonstrates that predicting discrete collision events accelerates learning speed and enhances policy stability.
Empirical evaluations on simulated and real-world RC car experiments show superior navigation performance with minimal human intervention.

The paper presents a novel approach to robot navigation by introducing a generalized computation graph that integrates elements of both model-free and model-based reinforcement learning (RL), specifically designed to enhance navigation policies. The authors' work challenges traditional navigation strategies by proposing a self-supervised learning framework that emphasizes sample efficiency and model stability when operating in complex and dynamic environments.

The research commences by identifying the limitations inherent in conventional navigation methods, which rely heavily on internal maps and localization plans. These traditional approaches often include numerous assumptions, leading to computational overhead and limited adaptability in unexpected scenarios. The authors contrast this with learning-based approaches that, while capable of adapting through experience, suffer from high sample complexity and difficulties in real-world deployment.

The principal innovation of this paper is the introduction of a generalized computation graph that encompasses both value-based model-free and model-based algorithms. This framework aims to leverage the sample efficiency of model-based learning alongside the high-dimensional task performance of model-free methods. For instance, while model-free methods like Q-learning excel in complex tasks, they generally demonstrate lower sample efficiency compared to model-based strategies, which can have limitations in handling high-dimensional inputs like images.

The generalized computation graph is instantiated through a deep recurrent neural network (RNN) that processes high-dimensional state inputs, such as raw images, to predict navigation policies. The research explores diverse design possibilities within this framework, including variations in model output types, such as collision probabilities versus predicted value approximations. A central finding is that predicting discrete collision events offers advantages in learning speed and stability over continuous value predictions.

Empirical evaluations are conducted on a simulated environment where an RC car navigates cluttered hallways, and on a real-world setting involving an indoor complex environment. The results demonstrate the approach's superior performance over prior methods such as single-step and multi-step Q-learning, notably in terms of learning stability and policy performance. Notable practical outcomes include the ability of the real-world RC car to autonomously navigate complex environments successfully using only monocular images with minimal human intervention over a short training period.

The implications of this research transcend robot navigation, suggesting a paradigm where event-specific learning and hybrid computation models could yield substantial improvements in AI systems. The focus on self-supervision aligns well with broader trends towards reducing human intervention in RL systems. Future work may explore extending these strategies to more generalized scenarios, such as dynamic outdoor environments or incorporating advanced exploration strategies to further enhance policy generalization.

In sum, this paper provides a significant contribution to the robot navigation domain by redefining how model-based and model-free elements can be harmoniously integrated via a computation graph framework, paving the way for more adaptable and efficient robotic systems.

PDF Markdown

Related Papers

GitHub

GitHub - gkahn13/gcg (102 stars)

Tweets

https://twitter.com/svlevine/status/1316439553893302272

YouTube

Show All Videos