Emergent Mind

Scaling Face Interaction Graph Networks to Real World Scenes

(2401.11985)
Published Jan 22, 2024 in cs.LG , cs.CV , and cs.RO

Abstract

Accurately simulating real world object dynamics is essential for various applications such as robotics, engineering, graphics, and design. To better capture complex real dynamics such as contact and friction, learned simulators based on graph networks have recently shown great promise. However, applying these learned simulators to real scenes comes with two major challenges: first, scaling learned simulators to handle the complexity of real world scenes which can involve hundreds of objects each with complicated 3D shapes, and second, handling inputs from perception rather than 3D state information. Here we introduce a method which substantially reduces the memory required to run graph-based learned simulators. Based on this memory-efficient simulation model, we then present a perceptual interface in the form of editable NeRFs which can convert real-world scenes into a structured representation that can be processed by graph network simulator. We show that our method uses substantially less memory than previous graph-based simulators while retaining their accuracy, and that the simulators learned in synthetic environments can be applied to real world scenes captured from multiple camera angles. This paves the way for expanding the application of learned simulators to settings where only perceptual information is available at inference time.

Overview

  • Graph neural networks (GNNs) can represent object interactions in simulations using graphs, but struggle in real-world complexities.

  • FIGNet*, a modified version of FIGNet, optimizes memory by simplifying the graph structure and supports intricate geometries.

  • By using Neural Radiance Fields (NeRFs) for perception, FIGNet* extracts real-world meshes for simulations.

  • Trained on synthetic data, FIGNet* successfully predicts object trajectories in real-world scenes with noisy data.

  • Combining NeRF with FIGNet* could revolutionize robotics and virtual scene editing, signaling a new direction in system identification.

Introduction

The simulation of rigid body dynamics plays a critical role in applications across robotics, graphics, and engineering. Analytic simulators, while widely deployed, often struggle to accurately capture the nuanced interactions between objects in real-world scenes, leading to the well-known simulation-to-reality gap. Graph neural networks (GNNs) have made progress in learning simulators that can predict the dynamics of objects by representing interactions as graph structures. However, when transitioning from synthetic environments to real-world settings, the complexity of object geometries and the demand for perception-driven inputs pose significant challenges.

Advancements in Computational Efficiency

In light of these challenges, Google DeepMind introduces a modification to its Face Interaction Graph Networks (FIGNet) simulator, known as FIGNet. The principal enhancement is a memory optimization achieved by a simple yet highly effective architectural change: the removal of node-node (surface mesh) edges within the graph structure. This alteration significantly minimizes the memory footprint, allowing the training of FIGNet on datasets featuring objects with intricate geometries. Consequently, FIGNet* outperforms its predecessor not only in terms of memory efficiency but also by enabling training on complex scenes such as the Kubric MOVi-C dataset, previously unmanageable due to memory restrictions.

Perception Integration and Real-World Application

The paper details methods for connecting the FIGNet* model to real-world perception. By employing Neural Radiance Fields (NeRFs) as a perceptual front-end, the authors extract the meshes required for simulation from real scenes. Furthermore, they demonstrate that the trained simulator can predict plausible object trajectories in previously unobserved real-world scenes. Notably, despite FIGNet* being trained with precise synthetic data, the model showcases robust performance when applied to noisy mesh estimates from real-world NeRF data.

Results and Implications

The authors present FIGNet's capacity to retain accuracy while using substantially less memory compared to traditional graph-based simulators. They also prove that FIGNet, once trained on synthetic rigid body dynamics, is capable of inferring from perceptual information at test time in real-world environments. The combination of NeRF with FIGNet* facilitates the simulation of alternative physical futures within actual scenes, showcasing significant promise for applications in fields including robotics and virtual scene editing. Moreover, this research suggests the potential for future developments in fine-tuning pre-trained models with real-world dynamics, providing a novel direction for system identification in robotics.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.