Learning Causal World Models with State Space Models

This presentation explores a novel approach to world modeling that combines State Space Models with causal discovery techniques. By representing environments as interacting objects and using attention mechanisms to infer causal relationships, the S2-SSM architecture achieves competitive performance with Transformers while opening new possibilities for understanding how AI agents can learn and reason about dynamic environments.
Script
Most AI systems predict what happens next, but few understand why. The researchers behind this work asked whether State Space Models could learn not just correlations but actual causal relationships in dynamic environments.
The key insight is elegantly simple: by applying sparsity regularization to attention weights, the model learns which objects actually influence each other and which interactions can be safely ignored. This transforms a dense tangle of potential relationships into a minimal causal graph that captures what truly matters.
The S2-SSM architecture breaks environments into object slots, each evolving independently through State Space Model layers. Cross-attention layers then model interactions, using attention weights as a direct window into causal structure. The entire system predicts future states while simultaneously learning which objects causally influence one another.
When tested on the Interventional Pong dataset, S2-SSM matched Transformer performance on state prediction. But the real breakthrough appears in causal discovery: with sparsity regularization, S2-SSM learned dramatically more accurate causal graphs, measured by Structural Hamming Distance, than dense attention models.
The current model has clear boundaries: it struggles with long-term dependencies and objects that disappear behind occlusions. These limitations point directly to the next research frontier, where the memory capacities of State Space Models could shine, tracking occluded objects and maintaining causal reasoning over extended time horizons.
This work demonstrates that State Space Models can learn not just to predict but to understand causality in dynamic worlds. That capability could transform how agents adapt to new environments and reason about interventions. To explore more research pushing the boundaries of AI understanding, visit EmergentMind.com and create your own videos from the latest papers.