OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (2405.15568v2)
Abstract: Open-ended and AI-generating algorithms aim to continuously generate and solve increasingly complex tasks indefinitely, offering a promising path toward more general intelligence. To accomplish this grand vision, learning must occur within a vast array of potential tasks. Existing approaches to automatically generating environments are constrained within manually predefined, often narrow distributions of environment, limiting their ability to create any learning environment. To address this limitation, we introduce a novel framework, OMNI-EPIC, that augments previous work in Open-endedness via Models of human Notions of Interestingness (OMNI) with Environments Programmed in Code (EPIC). OMNI-EPIC leverages foundation models to autonomously generate code specifying the next learnable (i.e., not too easy or difficult for the agent's current skill set) and interesting (e.g., worthwhile and novel) tasks. OMNI-EPIC generates both environments (e.g., an obstacle course) and reward functions (e.g., progress through the obstacle course quickly without touching red objects), enabling it, in principle, to create any simulatable learning task. We showcase the explosive creativity of OMNI-EPIC, which continuously innovates to suggest new, interesting learning challenges. We also highlight how OMNI-EPIC can adapt to reinforcement learning agents' learning progress, generating tasks that are of suitable difficulty. Overall, OMNI-EPIC can endlessly create learnable and interesting environments, further propelling the development of self-improving AI systems and AI-Generating Algorithms. Project website with videos: https://dub.sh/omniepic
- Human-Timescale Adaptation in an Open-Ended Task Space. In Proceedings of the 40th International Conference on Machine Learning, pages 1887–1935. PMLR. ISSN: 2640-3498.
- Managing AI Risks in an Era of Rapid Progress. arXiv:2310.17688 [cs].
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
- Quality-diversity through AI feedback. arXiv preprint arXiv:2310.13032.
- Language Models are Few-Shot Learners. arXiv:2005.14165 [cs].
- Genie: Generative Interactive Environments. arXiv:2402.15391 [cs].
- Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
- Clune, J. (2020). AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv:1905.10985 [cs].
- Pybullet, a python module for physics simulation for games, robotics and machine learning.
- Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design. In Advances in Neural Information Processing Systems, volume 33, pages 13049–13061. Curran Associates, Inc.
- Quality diversity through human feedback. arXiv preprint arXiv:2310.12103.
- Open Questions in Creating Safe Open-ended AI: Tensions Between Control and Creativity. arXiv:2006.07495 [cs].
- Mastering Diverse Domains through World Models. arXiv:2301.04104 [cs, stat].
- Emergence of Locomotion Behaviours in Rich Environments. arXiv:1707.02286 [cs].
- Introduction to automata theory, languages, and computation. Acm Sigact News, 32(1):60–65.
- Perceiver IO: A General Architecture for Structured Inputs & Outputs. arXiv:2107.14795 [cs, eess].
- Prioritized Level Replay. In Proceedings of the 38th International Conference on Machine Learning, pages 4940–4950. PMLR. ISSN: 2640-3498.
- General Intelligence Requires Rethinking Exploration. arXiv:2211.07819 [cs].
- Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft. arXiv preprint arXiv:2106.14876.
- Specification gaming: the flip side of AI ingenuity. https://deepmind.google/discover/blog/specification-gaming-the-flip-side-of-ai-ingenuity/.
- Evolution Through Large Models. In Banzhaf, W., Machado, P., and Zhang, M., editors, Handbook of Evolutionary Machine Learning, pages 331–366. Springer Nature Singapore, Singapore.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
- Large Language Models as In-context AI Generators for Quality-Diversity. arXiv preprint arXiv:2404.15794.
- Eureka: Human-Level Reward Design via Coding Large Language Models.
- OpenAI (2024). Text embedding 3 small. https://platform.openai.com/docs/guides/embeddings/embedding-models. Accessed: 17 May 2024.
- Solving Rubik’s Cube with a Robot Hand. arXiv:1910.07113 [cs, stat].
- Evolving Curricula with Regret-Based Environment Design. In Proceedings of the 39th International Conference on Machine Learning, pages 17473–17498. PMLR. ISSN: 2640-3498.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489. Number: 7587 Publisher: Nature Publishing Group.
- Why greatness cannot be planned: The myth of the objective. Springer.
- Open-endedness: The last grand challenge you’ve never heard of. While open-endedness could be a force for discovering intelligence, it could also be a component of AI itself.
- MarioGPT: Open-Ended Text2Level Generation through Large Language Models.
- Reinforcement learning: an introduction. Adaptive computation and machine learning series. The MIT Press, Cambridge, Massachusetts, second edition edition.
- Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–30. IEEE.
- Level Generation Through Large Language Models. In Proceedings of the 18th International Conference on the Foundations of Digital Games, FDG ’23, pages 1–8, New York, NY, USA. Association for Computing Machinery.
- Gymnasium.
- Visualizing data using t-SNE. Journal of machine learning research, 9(11).
- Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291.
- GenSim: Generating Robotic Simulation Tasks via Large Language Models.
- Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions. arXiv:1901.01753 [cs].
- Enhanced POET: Open-ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions. In Proceedings of the 37th International Conference on Machine Learning, pages 9940–9951. PMLR. ISSN: 2640-3498.
- RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation. arXiv:2311.01455 [cs].
- EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents. arXiv:2403.12014 [cs].
- OMNI: Open-endedness via Models of human Notions of Interestingness.