Slot Structured World Models (2402.03326v1)
Abstract: The ability to perceive and reason about individual objects and their interactions is a goal to be achieved for building intelligent artificial systems. State-of-the-art approaches use a feedforward encoder to extract object embeddings and a latent graph neural network to model the interaction between these object embeddings. However, the feedforward encoder can not extract {\it object-centric} representations, nor can it disentangle multiple objects with similar appearance. To solve these issues, we introduce {\it Slot Structured World Models} (SSWM), a class of world models that combines an {\it object-centric} encoder (based on Slot Attention) with a latent graph-based dynamics model. We evaluate our method in the Spriteworld benchmark with simple rules of physical interaction, where Slot Structured World Models consistently outperform baselines on a range of (multi-step) prediction tasks with action-conditional object interactions. All code to reproduce paper experiments is available from \url{https://github.com/JonathanCollu/Slot-Structured-World-Models}.
- Learning first-order markov models for control. Advances in neural information processing systems, 17, 2004.
- Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems, 29, 2016.
- Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
- Invariant slot attention: Object discovery with slot-centric reference frames. arXiv preprint arXiv:2302.04973, 2023.
- Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390, 2019.
- Object representations as fixed points: Training iterative refinement algorithms with implicit differentiation, 2022. URL https://arxiv.org/abs/2207.00787.
- A compositional object-based approach to learning physical dynamics. arXiv preprint arXiv:1612.00341, 2016.
- Genesis: Generative scene inference and sampling with object-centric latent representations. 2019. doi: https://doi.org/10.48550/arXiv.1907.13052.
- Multi-object representation learning with iterative variational inference. 2019. doi: https://doi.org/10.48550/arXiv.1903.00450.
- World models. arXiv preprint arXiv:1803.10122, 2018.
- Yedid Hoshen. Vain: Attentional multi-agent predictive modeling. Advances in neural information processing systems, 30, 2017.
- Improving object-centric learning with query optimization, 2022. URL https://arxiv.org/abs/2210.08990.
- Neural relational inference for interacting systems. In International conference on machine learning, pp. 2688–2697. PMLR, 2018.
- Contrastive learning of structured world models, 2019. URL https://arxiv.org/abs/1911.12247.
- Object-centric learning with slot attention, 2020. URL https://arxiv.org/abs/2006.15055.
- Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
- Graph networks as learnable physics engines for inference and control. In International Conference on Machine Learning, pp. 4470–4479. PMLR, 2018.
- The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008.
- Core knowledge. Developmental science, 10(1):89–96, 2007.
- Learning multiagent communication with backpropagation. Advances in neural information processing systems, 29, 2016.
- Erik Talvitie. Model regularization for stable sample rollouts. In UAI, pp. 780–789, 2014.
- Relational neural expectation maximization: Unsupervised discovery of objects and their interactions. arXiv preprint arXiv:1802.10353, 2018.
- Nervenet: Learning structured policy with graph neural networks. In International conference on learning representations, 2018.
- Visual interaction networks: Learning a physics simulator from video. Advances in neural information processing systems, 30, 2017.
- Spriteworld: A flexible, configurable reinforcement learning environment. https://github.com/deepmind/spriteworld/, 2019a. URL https://github.com/deepmind/spriteworld/.
- Cobra: Data-efficient model-based rl through unsupervised object discovery and curiosity-driven exploration. arXiv preprint arXiv:1905.09275, 2019b.
- A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24, 2020.
- Unsupervised discovery of parts, structure, and dynamics. arXiv preprint arXiv:1903.05136, 2019.
- Jonathan Collu (2 papers)
- Riccardo Majellaro (2 papers)
- Aske Plaat (76 papers)
- Thomas M. Moerland (24 papers)