Building general-purpose robots that can operate seamlessly, in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. Unfortunately, however, most existing robotic systems have been constrained - having been designed for specific tasks, trained on specific datasets, and deployed within specific environments. These systems usually require extensively-labeled data, rely on task-specific models, have numerous generalization issues when deployed in real-world scenarios, and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as NLP and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like. We begin by providing an overview of what constitutes a conventional robotic system and the fundamental barriers to making it universally applicable. Next, we establish a taxonomy to discuss current work exploring ways to leverage existing foundation models for robotics and develop ones catered to robotics. Finally, we discuss key challenges and promising future directions in using foundation models for enabling general-purpose robotic systems. We encourage readers to view our living GitHub repository of resources, including papers reviewed in this survey as well as related projects and repositories for developing foundation models for robotics.
Foundation models in robotics aim to overcome data scarcity and enhance generalizability, leveraging successes from NLP and CV.
These models can potentially ease the creation of adaptable, intelligent robots capable of operating in varied environments.
Challenges like task specification, model dependency, and safety are being addressed by the properties of foundation models.
Research shows a focus on pick-and-place tasks, with the need for better simulations, real-world data, and unified performance benchmarks.
Future exploration includes enhanced grounding, continual learning, cross-embodiment adaptability, and hardware innovations.
The field of robotics has long been focused on developing systems shaped for particular tasks, trained on specific datasets, and limited to defined environments. These systems often suffer from challenges such as data scarcity, lack of generalization, and robustness when faced with real-world scenarios. Encouraged by the success of foundational models in NLP and computer vision (CV), researchers are now exploring their application to robotics. Foundation models like LLMs, Vision Foundation Models (VFMs), and others possess qualities that align well with the vision for general-purpose robots—those that can seamlessly operate across various tasks and environments without extensive retraining.
Robotics systems comprise several core functionalities, including perception, decision-making and planning, and action generation. Each of these functionalities presents its own set of challenges. For example, perception systems need varied data to understand scenes and objects, while planning and control must adapt to new environments. The entry of foundation models into this domain aims at leveraging their strong generalization and learning abilities to address these hurdles, potentially smoothing the path toward truly adaptable and intelligent robotic systems.
Foundation models shine brightly when examining their impact on classical challenges in robotics:
Numerous studies have explored applying foundation models to various tasks, leading to several observations:
Looking ahead, several areas are ripe for exploration:
The application of foundation models to robotics holds the promise of achieving a higher level of autonomy, adaptability, and intelligence in robotic systems. As the field progresses, the blend of robust AI models and robotics could usher in a new era of smart, versatile machines ready to meet the complexities and unpredictability of the real world.