Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 30 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 12 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 184 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Slot Structured World Models (2402.03326v1)

Published 8 Jan 2024 in cs.CV and cs.LG

Abstract: The ability to perceive and reason about individual objects and their interactions is a goal to be achieved for building intelligent artificial systems. State-of-the-art approaches use a feedforward encoder to extract object embeddings and a latent graph neural network to model the interaction between these object embeddings. However, the feedforward encoder can not extract {\it object-centric} representations, nor can it disentangle multiple objects with similar appearance. To solve these issues, we introduce {\it Slot Structured World Models} (SSWM), a class of world models that combines an {\it object-centric} encoder (based on Slot Attention) with a latent graph-based dynamics model. We evaluate our method in the Spriteworld benchmark with simple rules of physical interaction, where Slot Structured World Models consistently outperform baselines on a range of (multi-step) prediction tasks with action-conditional object interactions. All code to reproduce paper experiments is available from \url{https://github.com/JonathanCollu/Slot-Structured-World-Models}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Learning first-order markov models for control. Advances in neural information processing systems, 17, 2004.
  2. Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems, 29, 2016.
  3. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
  4. Invariant slot attention: Object discovery with slot-centric reference frames. arXiv preprint arXiv:2302.04973, 2023.
  5. Monet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390, 2019.
  6. Object representations as fixed points: Training iterative refinement algorithms with implicit differentiation, 2022. URL https://arxiv.org/abs/2207.00787.
  7. A compositional object-based approach to learning physical dynamics. arXiv preprint arXiv:1612.00341, 2016.
  8. Genesis: Generative scene inference and sampling with object-centric latent representations. 2019. doi: https://doi.org/10.48550/arXiv.1907.13052.
  9. Multi-object representation learning with iterative variational inference. 2019. doi: https://doi.org/10.48550/arXiv.1903.00450.
  10. World models. arXiv preprint arXiv:1803.10122, 2018.
  11. Yedid Hoshen. Vain: Attentional multi-agent predictive modeling. Advances in neural information processing systems, 30, 2017.
  12. Improving object-centric learning with query optimization, 2022. URL https://arxiv.org/abs/2210.08990.
  13. Neural relational inference for interacting systems. In International conference on machine learning, pp. 2688–2697. PMLR, 2018.
  14. Contrastive learning of structured world models, 2019. URL https://arxiv.org/abs/1911.12247.
  15. Object-centric learning with slot attention, 2020. URL https://arxiv.org/abs/2006.15055.
  16. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
  17. Graph networks as learnable physics engines for inference and control. In International Conference on Machine Learning, pp. 4470–4479. PMLR, 2018.
  18. The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008.
  19. Core knowledge. Developmental science, 10(1):89–96, 2007.
  20. Learning multiagent communication with backpropagation. Advances in neural information processing systems, 29, 2016.
  21. Erik Talvitie. Model regularization for stable sample rollouts. In UAI, pp.  780–789, 2014.
  22. Relational neural expectation maximization: Unsupervised discovery of objects and their interactions. arXiv preprint arXiv:1802.10353, 2018.
  23. Nervenet: Learning structured policy with graph neural networks. In International conference on learning representations, 2018.
  24. Visual interaction networks: Learning a physics simulator from video. Advances in neural information processing systems, 30, 2017.
  25. Spriteworld: A flexible, configurable reinforcement learning environment. https://github.com/deepmind/spriteworld/, 2019a. URL https://github.com/deepmind/spriteworld/.
  26. Cobra: Data-efficient model-based rl through unsupervised object discovery and curiosity-driven exploration. arXiv preprint arXiv:1905.09275, 2019b.
  27. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24, 2020.
  28. Unsupervised discovery of parts, structure, and dynamics. arXiv preprint arXiv:1903.05136, 2019.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com