Emergent Mind

A Multifidelity Sim-to-Real Pipeline for Verifiable and Compositional Reinforcement Learning

(2312.01249)
Published Dec 2, 2023 in cs.RO , cs.AI , cs.SY , and eess.SY

Abstract

We propose and demonstrate a compositional framework for training and verifying reinforcement learning (RL) systems within a multifidelity sim-to-real pipeline, in order to deploy reliable and adaptable RL policies on physical hardware. By decomposing complex robotic tasks into component subtasks and defining mathematical interfaces between them, the framework allows for the independent training and testing of the corresponding subtask policies, while simultaneously providing guarantees on the overall behavior that results from their composition. By verifying the performance of these subtask policies using a multifidelity simulation pipeline, the framework not only allows for efficient RL training, but also for a refinement of the subtasks and their interfaces in response to challenges arising from discrepancies between simulation and reality. In an experimental case study we apply the framework to train and deploy a compositional RL system that successfully pilots a Warthog unmanned ground robot.

Overview

  • The paper proposes a compositional and verifiable framework for training RL systems, which decomposes complex tasks into subtasks for better simulation to real-world application.

  • It introduces a multifidelity sim-to-real pipeline using both low and high-fidelity simulations to train subtask policies more realistically.

  • The framework includes a high-level model (HLM) that oversees the training process, ensuring the composite policy meets the system's performance guarantees.

  • A case study involving a Warthog unmanned ground robot demonstrates the practical application and benefits of the framework.

  • Future work intends to expand the framework to multi-robot systems and vision-based tasks and may use temporal logic for task specifications.

Introduction to Reinforcement Learning Systems

Reinforcement Learning (RL) has shown promise in training systems that learn complex tasks, such as robotic control, in varied environments. However, translating these simulated RL policies to real-world hardware introduces challenges. Discrepancies between the simulations and reality may lead to behaviors that don't meet the expectations set in the virtual training phase. Moreover, ensuring these systems meet predefined performance criteria with a high certainty can be quite difficult, given the complexity of the tasks and the long time horizons over which they must operate.

A Compositional and Verifiable Approach

The paper discusses a compositional framework that trains and verifies RL systems within what is termed a multifidelity sim-to-real pipeline. The framework decomposes complex tasks into smaller subtasks. These subtasks are individually trained via RL algorithms in simulations and are then composed to accomplish the overall task. The framework employs a high-level model (HLM) that oversees this process, breaking down task specifications into subtask specifications, and using learned subtask capabilities to update itself, ensuring that the composite policy adheres to system guarantees.

Multifidelity Simulation for Enhanced Realism

To address challenges stemming from the sim-to-real paradigm, the paper introduces a multifidelity simulation pipeline comprising of low and high-fidelity simulations. Low-fidelity simulations focus on the robotic dynamics and underlying physics for quick policy training, while high-fidelity simulations incorporate the full autonomy software, transcending the limitations of low-fidelity by introducing factors like sensory noise and asynchronous communications present in the real-world environment. Performance assessments in these simulations inform the iterative process of policy improvement and HLM updates.

Validating with a Case Study

The practical application of this framework is demonstrated with a case study involving an unmanned ground robot called a Warthog. By employing the Unity engine for low-fidelity simulations and testing integration with high-fidelity software-in-the-loop simulations, a set of RL policies were developed and successfully deployed to control the robot's navigation. Notably, when discrepancies arose between simulation and reality, the framework facilitated targeted retraining for specific subtask policies, adeptly handling real-world dynamics without having to retrain the entire system.

Conclusion and Future Work

In summary, the proposed framework presents a structured method for training and verifying RL policies for robotic systems via a step-wise simulation approach, leading to reliable and adaptable deployment on physical hardware. Future work aims to extend the system's capabilities to multi-robot systems and vision-based tasks, amongst other complex robotic applications, potentially leveraging temporal logic for task specifications. The findings support the framework's potential in reducing the friction between simulated training and real-world application for autonomous systems.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.