From virtual demonstration to real-world manipulation using LSTM and MDN (1603.03833v4)

Published 12 Mar 2016 in cs.RO, cs.AI, and cs.LG

Abstract: Robots assisting the disabled or elderly must perform complex manipulation tasks and must adapt to the home environment and preferences of their user. Learning from demonstration is a promising choice, that would allow the non-technical user to teach the robot different tasks. However, collecting demonstrations in the home environment of a disabled user is time consuming, disruptive to the comfort of the user, and presents safety challenges. It would be desirable to perform the demonstrations in a virtual environment. In this paper we describe a solution to the challenging problem of behavior transfer from virtual demonstration to a physical robot. The virtual demonstrations are used to train a deep neural network based controller, which is using a Long Short Term Memory (LSTM) recurrent neural network to generate trajectories. The training process uses a Mixture Density Network (MDN) to calculate an error signal suitable for the multimodal nature of demonstrations. The controller learned in the virtual environment is transferred to a physical robot (a Rethink Robotics Baxter). An off-the-shelf vision component is used to substitute for geometric knowledge available in the simulation and an inverse kinematics module is used to allow the Baxter to enact the trajectory. Our experimental studies validate the three contributions of the paper: (1) the controller learned from virtual demonstrations can be used to successfully perform the manipulation tasks on a physical robot, (2) the LSTM+MDN architectural choice outperforms other choices, such as the use of feedforward networks and mean-squared error based training signals and (3) allowing imperfect demonstrations in the training set also allows the controller to learn how to correct its manipulation mistakes.

Citations (14)

View on Semantic Scholar

Summary

The paper introduces an LSTM-MDN architecture that enables effective simulator-to-reality transfer for robot manipulation tasks.
The methodology outperforms traditional feedforward-MSE models by leveraging LSTM memory and MDN’s modeling of multimodal error landscapes.
The paper reveals that using imperfect demonstrations enhances system robustness, allowing controllers to self-correct in real-world operations.

From Virtual Demonstration to Real-World Manipulation Using LSTM and MDN

This paper investigates a methodology for transferring robot manipulation skills learned in a virtual environment to a physical setting. The focus is on assistive robotics, where the ability to perform manipulation tasks like picking and placing or pushing objects to a desired position is crucial. The authors propose a learning-from-demonstration (LfD) approach that trains a neural network controller using virtual demonstrations, which is then deployed on a physical robot.

The architecture of the neural network employed here is particularly noteworthy, combining Long Short Term Memory (LSTM) layers with Mixture Density Networks (MDN). This combination aids in capturing the sequential nature of manipulation tasks, which often require a commitment to a specific sequence of actions, thus leveraging LSTM's memory capabilities. The MDN is instrumental in modeling the multimodal error landscape, allowing the system to account for multiple viable task solutions without degrading into a non-optimal average of these solutions, an issue typical in traditional models using a mean-squared error approach.

The paper presents three claims backed by experimental validation:

Simulator to Reality Transfer: The controller trained in a virtual simulator demonstrated a strong capacity to operate in a real-world setting—specifically when mounted on a Rethink Robotics Baxter platform. This was substantiated by successful task execution in the physical environment despite inevitable discrepancies between simulated and real-world physics.
Superior LSTM and MDN Architecture: A comparative analysis indicated that the LSTM-MDN architecture outperformed simpler neural network architectures such as feedforward-MSE combinations. This was consistent across both tasks studied: the "pick and place" as well as the "pushing to desired pose" tasks, attesting to the rigor of this design choice.
Utility of Imperfect Demonstrations: An unconventional but insightful result was that retaining imperfect demonstrations, where human demonstrators corrected their errors, benefited the system. It appeared to induce robustness, enabling the controller to self-correct when deviations from the expected trajectory occurred during real-world operation.

The implications of this work lie in enhancing the autonomy of assistive robots in practical home environments, which is vital for addressing the diverse and dynamic needs of elderly and disabled users. By leveraging virtual environments, the approach mitigates the risks and discomforts associated with physical demonstrations, while effectively creating a transferable manipulation policy.

This research serves as a cornerstone for further studies into the integration of LfD with reinforcement learning techniques, potentially leading to more nuanced and adaptable assistive robotic systems. Future expansions could focus on multi-task learning, scaling up the complexity of the tasks, and refining the end-to-end learning process, potentially incorporating advanced vision-to-control circuitry. The ongoing refinement of virtual-to-real transfer is likely to remain a pivotal aspect of advancing autonomous systems in structured and unstructured environments alike.

PDF Markdown

Related Papers

YouTube

Show All Videos