A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies (1810.04871v1)

Published 11 Oct 2018 in cs.RO, cs.AI, cs.CV, and cs.LG

Abstract: Learning effective visuomotor policies for robots purely from data is challenging, but also appealing since a learning-based system should not require manual tuning or calibration. In the case of a robot operating in a real environment the training process can be costly, time-consuming, and even dangerous since failures are common at the start of training. For this reason, it is desirable to be able to leverage \textit{simulation} and \textit{off-policy} data to the extent possible to train the robot. In this work, we introduce a robust framework that plans in simulation and transfers well to the real environment. Our model incorporates a gradient-descent based planning module, which, given the initial image and goal image, encodes the images to a lower dimensional latent state and plans a trajectory to reach the goal. The model, consisting of the encoder and planner modules, is trained through a meta-learning strategy in simulation first. We subsequently perform adversarial domain transfer on the encoder by using a bank of unlabelled but random images from the simulation and real environments to enable the encoder to map images from the real and simulated environments to a similarly distributed latent representation. By fine tuning the entire model (encoder + planner) with far fewer real world expert demonstrations, we show successful planning performances in different navigation tasks.

Citations (37)

View on Semantic Scholar

Summary

The paper presents a framework that efficiently trains visuomotor navigation policies by combining simulation data with limited real-world fine-tuning.
It employs adversarial domain adaptation and a stochastic forward dynamics model to effectively handle real-world uncertainties.
Experimental results in lane-following tasks demonstrate robust performance and significantly reduced reliance on extensive real-world training.

Introduction

The paper introduces a framework designed to efficiently train visuomotor policies for robotic navigation by combining simulation and real-world data. The main focus is on reducing the high cost and risk associated with real-world robot training through sim-to-real transfer by using meta-learning and adversarial domain adaptation techniques.

Methodology

General Framework

The proposed method involves training a planner in a simulated environment, transferring the learned policy to a real setup using adversarial domain adaptation, and fine-tuning with a small number of real-world examples.

Figure 1: Sim-to-real transfer of navigation policies, illustrating the main elements such as simulation-based training, adversarial discriminative transfer, and real environment fine-tuning.

Stochastic Forward Dynamics Model

Introducing a stochastic element in the dynamics model addresses the real-world stochasticity not captured by a deterministic model. This is achieved by embedding the state transitions in a probabilistic framework, thereby enhancing robustness when deployed on real robots.

Learning the Planning Loss Function

To improve adaptability, the authors propose a multi-layer perceptron (MLP) as a learnable planning loss function. This approach adapts the loss to specific task requirements, optimizing it through an outer-loop imitation learning process.

Regularization Strategies

The paper incorporates smoothness and consistency regularization techniques to expedite convergence. Smoothness regularization ensures consecutive latent states are close in latent space, while consistency regularization enforces congruence between direct image encodings and propagated states.

Sim-to-Real Transfer Approach

The sim-to-real transfer leverages a generative adversarial network (GAN) architecture to match the latent encodings of real and simulated images. This adversarial training aligns the image embeddings such that the dynamics model, trained in simulation, can generalize effectively to real-world conditions.

Experimental Setup

Environment

Experiments are conducted in the Duckietown simulator and physical testbed to evaluate policy performance in lane-following and left-turn scenarios. This diverse set of initial conditions facilitates a comprehensive assessment of both the trained policy and transfer strategy.

Results

Performance in Simulation

The planner demonstrates strong performance in simulation, evidenced by rapid convergence and high average rewards in lane-following and left-turn tasks. Model A, with full multi-component integration, outperformed baseline methods significantly.

Figure 2: Evaluation of the inner loop loss function on Duckietown Simulator, demonstrating robustness across various horizon lengths trained via curriculum learning.

Real Robot Deployment

Successful sim-to-real transfer is validated by the model's performance in real-world tests, achieving over 50% completion rates even in challenging configurations. The adversarial adaptation technique proved effective, requiring fewer real-world iterations compared to direct training.

Figure 3: Evaluation of the performance of adversarial transfer in terms of loss as a function of the number of real and simulated images used in training, after pre-training in simulation with 2000 expert demos.

Conclusion

The paper presents a robust framework for sim-to-real transfer that significantly reduces the number of required real-world demonstrations. The methodology demonstrates a marked improvement in the ease and efficiency of training robotic navigation policies, paving the way for broader applicability in complex robotic tasks. Future work may include refining the adversarial transfer process to further minimize real-world training data requirements and exploring additional applications within mobile robotics.