Scaling Face Interaction Graph Networks to Real World Scenes (2401.11985v1)

Published 22 Jan 2024 in cs.LG, cs.CV, and cs.RO

Abstract: Accurately simulating real world object dynamics is essential for various applications such as robotics, engineering, graphics, and design. To better capture complex real dynamics such as contact and friction, learned simulators based on graph networks have recently shown great promise. However, applying these learned simulators to real scenes comes with two major challenges: first, scaling learned simulators to handle the complexity of real world scenes which can involve hundreds of objects each with complicated 3D shapes, and second, handling inputs from perception rather than 3D state information. Here we introduce a method which substantially reduces the memory required to run graph-based learned simulators. Based on this memory-efficient simulation model, we then present a perceptual interface in the form of editable NeRFs which can convert real-world scenes into a structured representation that can be processed by graph network simulator. We show that our method uses substantially less memory than previous graph-based simulators while retaining their accuracy, and that the simulators learned in synthetic environments can be applied to real world scenes captured from multiple camera angles. This paves the way for expanding the application of learned simulators to settings where only perceptual information is available at inference time.

Citations (2)

View on Semantic Scholar

Summary

The paper introduces FIGNet*, a modified graph network that removes node-node edges to drastically reduce memory usage for complex scene training.
The authors integrate Neural Radiance Fields as a perceptual front-end to extract accurate mesh data from real-world scenes.
The paper demonstrates that FIGNet* maintains high simulation accuracy while bridging the simulation-to-reality gap in object dynamics.

Introduction

The simulation of rigid body dynamics plays a critical role in applications across robotics, graphics, and engineering. Analytic simulators, while widely deployed, often struggle to accurately capture the nuanced interactions between objects in real-world scenes, leading to the well-known simulation-to-reality gap. Graph neural networks (GNNs) have made progress in learning simulators that can predict the dynamics of objects by representing interactions as graph structures. However, when transitioning from synthetic environments to real-world settings, the complexity of object geometries and the demand for perception-driven inputs pose significant challenges.

Advancements in Computational Efficiency

In light of these challenges, Google DeepMind introduces a modification to its Face Interaction Graph Networks (FIGNet) simulator, known as FIGNet*. The principal enhancement is a memory optimization achieved by a simple yet highly effective architectural change: the removal of node-node (surface mesh) edges within the graph structure. This alteration significantly minimizes the memory footprint, allowing the training of FIGNet* on datasets featuring objects with intricate geometries. Consequently, FIGNet* outperforms its predecessor not only in terms of memory efficiency but also by enabling training on complex scenes such as the Kubric MOVi-C dataset, previously unmanageable due to memory restrictions.

Perception Integration and Real-World Application

The paper details methods for connecting the FIGNet* model to real-world perception. By employing Neural Radiance Fields (NeRFs) as a perceptual front-end, the authors extract the meshes required for simulation from real scenes. Furthermore, they demonstrate that the trained simulator can predict plausible object trajectories in previously unobserved real-world scenes. Notably, despite FIGNet* being trained with precise synthetic data, the model showcases robust performance when applied to noisy mesh estimates from real-world NeRF data.

Results and Implications

The authors present FIGNet*'s capacity to retain accuracy while using substantially less memory compared to traditional graph-based simulators. They also prove that FIGNet*, once trained on synthetic rigid body dynamics, is capable of inferring from perceptual information at test time in real-world environments. The combination of NeRF with FIGNet* facilitates the simulation of alternative physical futures within actual scenes, showcasing significant promise for applications in fields including robotics and virtual scene editing. Moreover, this research suggests the potential for future developments in fine-tuning pre-trained models with real-world dynamics, providing a novel direction for system identification in robotics.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_akhaliq/status/1749671384949592148