Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation (2405.07503v2)

Published 13 May 2024 in cs.RO and cs.AI

Abstract: Many robotic systems, such as mobile manipulators or quadrotors, cannot be equipped with high-end GPUs due to space, weight, and power constraints. These constraints prevent these systems from leveraging recent developments in visuomotor policy architectures that require high-end GPUs to achieve fast policy inference. In this paper, we propose Consistency Policy, a faster and similarly powerful alternative to Diffusion Policy for learning visuomotor robot control. By virtue of its fast inference speed, Consistency Policy can enable low latency decision making in resource-constrained robotic setups. A Consistency Policy is distilled from a pretrained Diffusion Policy by enforcing self-consistency along the Diffusion Policy's learned trajectories. We compare Consistency Policy with Diffusion Policy and other related speed-up methods across 6 simulation tasks as well as three real-world tasks where we demonstrate inference on a laptop GPU. For all these tasks, Consistency Policy speeds up inference by an order of magnitude compared to the fastest alternative method and maintains competitive success rates. We also show that the Conistency Policy training procedure is robust to the pretrained Diffusion Policy's quality, a useful result that helps practioners avoid extensive testing of the pretrained model. Key design decisions that enabled this performance are the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps.

Citations (23)

View on Semantic Scholar

Summary

The paper introduces a novel distillation process that enforces self-consistency in pretrained diffusion policies to reduce inference steps.
The methodology uses a teacher-student framework with EDM and CTM to achieve one- and three-step inference, significantly boosting speed.
Experiments in simulation and real-world tasks demonstrate competitive success rates, with improvements such as 100% success in lift tasks and reduced computational latency.

Accelerated Visuomotor Policies via Consistency Distillation

Introduction to Consistency Policy

Robotic systems, whether mobile or stationary, often can't afford the luxury of high-end GPUs due to space, weight, and power constraints. This limitation creates a roadblock for leveraging advanced visuomotor policy architectures that need extensive computational resources for fast policy inference. Enter Consistency Policy, a new approach designed to provide a faster yet competitively performing alternative to traditional Diffusion Policies.

The main idea behind Consistency Policy is to distill a pretrained Diffusion Policy into a more efficient model by enforcing self-consistency along the learned trajectories. This allows the distilled model to make decisions much quicker and with less computational power.

How It Works

Diffusion Models: A Quick Primer

Before diving into Consistency Policy, let's get a handle on diffusion models, which have shown impressive results in imitation learning for robotic control. In essence, they start with a noisy initial state and sequentially denoise it to produce the desired action. This typically requires multiple steps and significant computational power.

The Need for Speed

Diffusion models are effective but slow. For example, Diffusion Policies using Denoising Diffusion Probabilistic Models (DDPM) perform multiple forward evaluations, taking around one second per action generation on an NVIDIA T4 GPU. This latency is impractical for robots that need quick decision-making capabilities, such as dynamic object manipulation or agile navigation.

Enter Consistency Policy

Training Process

Teacher Model (EDM Framework): The first step involves training a teacher model using an efficient diffusion framework called EDM. This model learns to predict actions by progressively denoising a sequence of noisy inputs.
Consistency Trajectory Model (CTM): Next, a student model is distilled from the teacher by enforcing self-consistency along the learned trajectories. The idea is to train the student model to generate the same predicted actions when given different points on the same trajectory. This drastically reduces the number of steps needed for predictions.

Inference Speed

Consistency Policy offers two primary modes of inference:

Single-Step Inference: For ultra-low latency requirements, the model can predict actions in just one step.
Three-Step Inference: This trades off a bit of speed for higher accuracy by chaining actions over three steps.

Strong Numerical Results

The evaluation of Consistency Policy covered six simulation tasks and two real-world tasks, showing significant improvements in inference speed without a meaningful drop in performance.

Simulation Tasks

Robomimic Tasks (Lift, Can, Square, Tool Hang): The results demonstrated that Consistency Policy could achieve success rates comparable to DDPM and DDiM but in much less time. For instance:
- Lift: 100% success rate with 1-step inference, maintaining parity with the best-performing methods.
- Can: Surpassing DDiM with a 98% success rate in 1-step inference.
Franka Kitchen & Push-T Tasks: Here, single-step Consistency Policy performed commendably, showing that it can handle both long-horizon and multi-stage tasks efficiently.

Real-World Applications

Trash Clean Up Task: Consistency Policy maintained an 80% success rate, significantly speeding up the inference time (21 ms compared to 192 ms for DDiM).
Plug Insertion Task: Similar trends were observed, highlighting the method’s robustness even in more intricate, contact-rich tasks.

Implications and Future Developments

Practical Implications: Consistency Policy opens the door for employing advanced visuomotor policies in resource-constrained environments. This makes high-level robot control feasible on devices with limited computational capabilities.

Theoretical Implications: The robust performance across varying teacher model qualities indicates that extensive fine-tuning of pretrained models might be unnecessary, simplifying the setup process.

Future Directions: The success of Consistency Policy suggests several avenues for further research:

Multimodality Improvements: Adding more complex sampling schemes to reintroduce the lost multimodal behaviors in distilled models.
General Applicability: Extending the approach to other forms of robot policies and exploring its efficacy with other architectures like transformers.

Conclusion

In summary, Consistency Policy represents a leap forward for practical, efficient robot control, achieving substantial gains in inference speed while retaining competitive success rates across a variety of robotic tasks. Its balance of speed and performance makes it a promising tool for the broader application of AI in robotics.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_Aaditya_Prasad/status/1790501613653917782

https://twitter.com/_vztu/status/1808546095188037664

https://twitter.com/realmofresearch/status/1791500450287747551