Emergent Mind

General-purpose foundation models for increased autonomy in robot-assisted surgery

(2401.00678)

Published Jan 1, 2024 in cs.RO , cs.LG , and q-bio.TO

Abstract

The dominant paradigm for end-to-end robot learning focuses on optimizing task-specific objectives that solve a single robotic problem such as picking up an object or reaching a target position. However, recent work on high-capacity models in robotics has shown promise toward being trained on large collections of diverse and task-agnostic datasets of video demonstrations. These models have shown impressive levels of generalization to unseen circumstances, especially as the amount of data and the model complexity scale. Surgical robot systems that learn from data have struggled to advance as quickly as other fields of robot learning for a few reasons: (1) there is a lack of existing large-scale open-source data to train models, (2) it is challenging to model the soft-body deformations that these robots work with during surgery because simulation cannot match the physical and visual complexity of biological tissue, and (3) surgical robots risk harming patients when tested in clinical trials and require more extensive safety measures. This perspective article aims to provide a path toward increasing robot autonomy in robot-assisted surgery through the development of a multi-modal, multi-task, vision-language-action model for surgical robots. Ultimately, we argue that surgical robots are uniquely positioned to benefit from general-purpose models and provide three guiding actions toward increased autonomy in robot-assisted surgery.

Overview

Robot learning for RAS faces challenges like soft-body modeling and increased risk; general-purpose models may offer solutions.
General-purpose models can learn a broad range of skills using self-supervised learning from diverse datasets, beneficial for RAS.
RT architecture, using inputs from language, visual, and sensor data, shows promise in generalization for robotics tasks.
Surgical robots are ideal for RTs due to stationary operation, high-performance computing access, and a wealth of training data.
Advancements may include conservative Q-learning for risk avoidance and conformal prediction for assessing action confidence.

Overview of Robot-Assisted Surgery Learning

Robot learning, a component of AI specific to robotics, typically concentrates on optimizing robots to complete specialized tasks using techniques like Deep Reinforcement Learning (DRL). However, robot-assisted surgery (RAS) presents a unique set of challenges, such as soft-body modeling and a higher risk of causing harm, that have hindered the application of these approaches. Recent research has indicated that general-purpose models could be key to meeting these challenges, offering a broader range of skills and better generalization to varied tasks.

General-Purpose Models in Robotics

The research discusses how large-scale, high-capacity models, similar to foundation models in NLP, could benefit RAS. These models train on extensive, diverse datasets using self-supervised learning, which fosters robust knowledge and skill bases within AI without the need for human-labeled data. In robotics, this approach has given rise to the robot transformer (RT) architecture, which combines inputs from language, visual cues from cameras, and sensor data to learn from offline task demonstrations. RTs have demonstrated a promising ability to generalize across different tasks and conditions not covered during training.

The Unique Opportunity for Surgical Robots

Surgical robots are well-suited for integrating RTs due to their stationary operation, which alleviates concerns about computation time and energy efficiency faced by mobile robots. Given that surgical robots don't rely on battery power and can interface with high-performance computing, they can work with much more computationally demanding models. Moreover, the abundance of surgical procedures recorded daily provides a rich, untapped source of training data. However, three major challenges are identified: developing risk-avoidant behaviors, unifying medical data across institutions, and enhancing safety beyond current demonstration data quality.

Path Forward and Implications

To tackle these challenges, the paper suggests a combination of conservative Q-learning to predict and avoid high-risk situations, and conformal prediction to gauge the robot's confidence in its actions, potentially handing off control to a human surgeon in uncertain scenarios. Merging medical data across institutions and adding layers of safety assessment tailored to surgical quality could allow surgical robots to surpass human performance standards.

The authors envision an RT-RAS system (RT model for robot-assisted surgery) that uses real-world data for ongoing improvement and enhanced autonomous capabilities. This could lead to more consistent surgeries and reduced costs. Additionally, such autonomous systems could revolutionize surgical training and safety, by providing immediate expert feedback and safety measures for training surgeons. This necessitates a collaborative effort among academia, healthcare institutions, and industry to realize the benefits of general-purpose models in robot-assisted surgery.