Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

98 tokens/sec

GPT-4o

8 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

5 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Octopi: Object Property Reasoning with Large Tactile-Language Models (2405.02794v2)

Published 5 May 2024 in cs.RO

Abstract: Physical reasoning is important for effective robot manipulation. Recent work has investigated both vision and language modalities for physical reasoning; vision can reveal information about objects in the environment and language serves as an abstraction and communication medium for additional context. Although these works have demonstrated success on a variety of physical reasoning tasks, they are limited to physical properties that can be inferred from visual or language inputs. In this work, we investigate combining tactile perception with language, which enables embodied systems to obtain physical properties through interaction and apply commonsense reasoning. We contribute a new dataset PhysiCLeAR, which comprises both physical/property reasoning tasks and annotated tactile videos obtained using a GelSight tactile sensor. We then introduce Octopi, a system that leverages both tactile representation learning and large vision-LLMs to predict and reason about tactile inputs with minimal language fine-tuning. Our evaluations on PhysiCLeAR show that Octopi is able to effectively use intermediate physical property predictions to improve its performance on various tactile-related tasks. PhysiCLeAR and Octopi are available at https://github.com/clear-nus/octopi.

References (64)

Citations (12)

View on Semantic Scholar

Summary

The paper introduces Octopi, a system that combines tactile data with language models to advance object property reasoning in robotics.
It employs a novel methodology using GelSight sensors and a CLIP-based encoder to align high-resolution tactile images with language data.
Experimental results show significant improvements in physical reasoning tasks, demonstrated through applications like avocado ripeness classification.

Object Property Reasoning with Large Tactile-LLMs: An Expert Review

The paper "Octopi: Object Property Reasoning with Large Tactile-LLMs" introduces a novel approach to enhancing robot manipulation capabilities by bridging the gap between tactile perception and language-based common-sense reasoning. This paper addresses the limitations of traditional modalities—vision and language—by integrating tactile information, which provides critical details about object properties that cannot be discerned through vision alone.

Contributions and Methodology

The core contribution of this research lies in the development of Octopi, a system that leverages a combination of tactile sensors and large vision-LLMs (LVLMs) for object property reasoning. The paper introduces the PhysiCLeAR dataset, which includes tactile video data annotated with physical properties such as hardness, roughness, and bumpiness. These annotations serve as the foundation for training Octopi to process and reason about tactile signals.

Octopi employs a unique approach in tactile representation learning, utilizing a GelSight tactile sensor to obtain high-resolution tactile images. These images are then interpreted using a CLIP-based encoder, enabling the alignment of tactile and language data. The integration of LLMs (e.g., LLaMA-based models) is crucial for performing higher-order reasoning based on both language instructions and tactile inputs.

Experimental Results

The research presents detailed experimental results showcasing Octopi's effectiveness. The system demonstrates substantial improvements in physical reasoning tasks, both in trained and zero-shot settings. Notably, Octopi achieved significant accuracy improvements over baseline methods in object property description, property comparison, and scenario reasoning tasks.

Moreover, the paper highlights Octopi's successful deployment in a real robotic system for an avocado ripeness classification task. This practical application underscores the model's ability to reason about real-world tactile properties and improve decision-making in scenarios where visual assessments are insufficient.

Implications and Future Directions

The implications of this research are vast, marking significant progress toward more autonomous and intelligent robotic systems. By equipping robots with tactile reasoning capabilities, Octopi opens new avenues for applications in diverse fields, including manufacturing, healthcare, and service robotics, where understanding material properties through touch is valuable.

Future developments could focus on expanding the dataset to incorporate more diverse tactile interactions and further refining the tactile-language integration. The exploration of additional sensors and sensory modalities could also enhance the system's ability to capture and utilize complex object properties, thereby broadening the scope of tasks that robots can perform autonomously.

Conclusion

The integration of tactile sensing with LLMs represents a critical advancement in embodied AI. This paper provides a robust framework for leveraging tactile data in conjunction with LVLMs, enhancing a robot's ability to interact with and reason about the physical world. Through the development of Octopi and the PhysiCLeAR dataset, the paper sets a foundation for future research in tactile-guided reasoning, with promising implications for the evolution of robotic capabilities.

PDF Markdown

Tweets

https://twitter.com/DJiafei/status/1811758108605317258

https://twitter.com/OWW/status/1798756422307708991

YouTube

Show All Videos