Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Slot Structured World Models (2402.03326v1)

Published 8 Jan 2024 in cs.CV and cs.LG

Abstract: The ability to perceive and reason about individual objects and their interactions is a goal to be achieved for building intelligent artificial systems. State-of-the-art approaches use a feedforward encoder to extract object embeddings and a latent graph neural network to model the interaction between these object embeddings. However, the feedforward encoder can not extract {\it object-centric} representations, nor can it disentangle multiple objects with similar appearance. To solve these issues, we introduce {\it Slot Structured World Models} (SSWM), a class of world models that combines an {\it object-centric} encoder (based on Slot Attention) with a latent graph-based dynamics model. We evaluate our method in the Spriteworld benchmark with simple rules of physical interaction, where Slot Structured World Models consistently outperform baselines on a range of (multi-step) prediction tasks with action-conditional object interactions. All code to reproduce paper experiments is available from \url{https://github.com/JonathanCollu/Slot-Structured-World-Models}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Learning first-order markov models for control. Advances in neural information processing systems, 17, 2004.
  2. Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems, 29, 2016.
  3. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
  4. Invariant slot attention: Object discovery with slot-centric reference frames. arXiv preprint arXiv:2302.04973, 2023.
  5. Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390, 2019.
  6. Object representations as fixed points: Training iterative refinement algorithms with implicit differentiation, 2022. URL https://arxiv.org/abs/2207.00787.
  7. A compositional object-based approach to learning physical dynamics. arXiv preprint arXiv:1612.00341, 2016.
  8. Genesis: Generative scene inference and sampling with object-centric latent representations. 2019. doi: https://doi.org/10.48550/arXiv.1907.13052.
  9. Multi-object representation learning with iterative variational inference. 2019. doi: https://doi.org/10.48550/arXiv.1903.00450.
  10. World models. arXiv preprint arXiv:1803.10122, 2018.
  11. Yedid Hoshen. Vain: Attentional multi-agent predictive modeling. Advances in neural information processing systems, 30, 2017.
  12. Improving object-centric learning with query optimization, 2022. URL https://arxiv.org/abs/2210.08990.
  13. Neural relational inference for interacting systems. In International conference on machine learning, pp. 2688–2697. PMLR, 2018.
  14. Contrastive learning of structured world models, 2019. URL https://arxiv.org/abs/1911.12247.
  15. Object-centric learning with slot attention, 2020. URL https://arxiv.org/abs/2006.15055.
  16. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
  17. Graph networks as learnable physics engines for inference and control. In International Conference on Machine Learning, pp. 4470–4479. PMLR, 2018.
  18. The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008.
  19. Core knowledge. Developmental science, 10(1):89–96, 2007.
  20. Learning multiagent communication with backpropagation. Advances in neural information processing systems, 29, 2016.
  21. Erik Talvitie. Model regularization for stable sample rollouts. In UAI, pp.  780–789, 2014.
  22. Relational neural expectation maximization: Unsupervised discovery of objects and their interactions. arXiv preprint arXiv:1802.10353, 2018.
  23. Nervenet: Learning structured policy with graph neural networks. In International conference on learning representations, 2018.
  24. Visual interaction networks: Learning a physics simulator from video. Advances in neural information processing systems, 30, 2017.
  25. Spriteworld: A flexible, configurable reinforcement learning environment. https://github.com/deepmind/spriteworld/, 2019a. URL https://github.com/deepmind/spriteworld/.
  26. Cobra: Data-efficient model-based rl through unsupervised object discovery and curiosity-driven exploration. arXiv preprint arXiv:1905.09275, 2019b.
  27. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24, 2020.
  28. Unsupervised discovery of parts, structure, and dynamics. arXiv preprint arXiv:1903.05136, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jonathan Collu (2 papers)
  2. Riccardo Majellaro (2 papers)
  3. Aske Plaat (76 papers)
  4. Thomas M. Moerland (24 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com