OMNI: Open-endedness via Models of human Notions of Interestingness (2306.01711v3)
Abstract: Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast environment search space, but there are thus infinitely many possible tasks. Even after filtering for tasks the current agent can learn (i.e., learning progress), countless learnable yet uninteresting tasks remain (e.g., minor variations of previously learned tasks). An Achilles Heel of open-endedness research is the inability to quantify (and thus prioritize) tasks that are not just learnable, but also $\textit{interesting}$ (e.g., worthwhile and novel). We propose solving this problem by $\textit{Open-endedness via Models of human Notions of Interestingness}$ (OMNI). The insight is that we can utilize foundation models (FMs) as a model of interestingness (MoI), because they $\textit{already}$ internalize human concepts of interestingness from training on vast amounts of human-generated data, where humans naturally write about what they find interesting or boring. We show that FM-based MoIs improve open-ended learning by focusing on tasks that are both learnable $\textit{and interesting}$, outperforming baselines based on uniform task sampling or learning progress alone. This approach has the potential to dramatically advance the ability to intelligently select which tasks to focus on next (i.e., auto-curricula), and could be seen as AI selecting its own next task to learn, facilitating self-improving AI and AI-Generating Algorithms. Project website at https://www.jennyzhangzt.com/omni/
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- Solving rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113, 2019.
- A survey on intrinsic motivation in reinforcement learning. arXiv preprint arXiv:1908.06976, 2019.
- Evolving CPPNs to grow three-dimensional physical structures. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pp. 627–634, 2010.
- Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems, 61(1):49–73, 2013.
- Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
- Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pp. 41–48, 2009.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Nick Bostrom. Existential risks: Analyzing human extinction scenarios and related hazards. Journal of Evolution and technology, 9, 2002.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Learning with amigo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122, 2020.
- Babyai: A platform to study the sample efficiency of grounded language learning. arXiv preprint arXiv:1810.08272, 2018.
- Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
- Jeff Clune. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv preprint arXiv:1905.10985, 2019.
- Language as a cognitive tool to imagine goals in curiosity driven exploration. Advances in Neural Information Processing Systems, 33:3761–3774, 2020.
- Language and culture internalization for human-like autotelic AI. Nature Machine Intelligence, 4(12):1068–1076, 2022a.
- Autotelic agents with intrinsically motivated goal-conditioned reinforcement learning: a short survey. Journal of Artificial Intelligence Research, 74:1159–1199, 2022b.
- Augmenting autotelic agents with large language models. arXiv preprint arXiv:2305.12487, 2023.
- AI research considerations for human existential safety (ARCHES). arXiv preprint arXiv:2006.04948, 2020.
- Emergent complexity and zero-shot transfer via unsupervised environment design. Advances in neural information processing systems, 33:13049–13061, 2020.
- Gate-variants of gated recurrent unit (gru) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pp. 1597–1600. IEEE, 2017.
- Transfer dynamics in emergent evolutionary curricula. IEEE Transactions on Games, 2022.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- Guiding Pretraining in Reinforcement Learning with Large Language Models. arXiv preprint arXiv:2302.06692, 2023.
- Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995, 2019.
- Open Questions in Creating Safe Open-ended AI: Tensions Between Control and Creativity. In ALIFE 2020: The 2020 Conference on Artificial Life, pp. 27–35. MIT Press, 2020.
- First return, then explore. Nature, 590(7847):580–586, 2021.
- Hierarchically organized latent modules for exploratory search in morphogenetic systems. Advances in Neural Information Processing Systems, 33:4846–4859, 2020.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. arXiv preprint arXiv:2206.08853, 2022.
- Automatic goal generation for reinforcement learning agents. In International conference on machine learning, pp. 1515–1528. PMLR, 2018.
- Automated curriculum learning for neural networks. In international conference on machine learning, pp. 1311–1320. PMLR, 2017.
- Adversarial environment generation for learning to navigate the web. arXiv preprint arXiv:2103.01991, 2021.
- Danijar Hafner. Benchmarking the spectrum of agent capabilities. arXiv preprint arXiv:2109.06780, 2021.
- Grounded language learning in a simulated 3d world. arXiv preprint arXiv:1706.06551, 2017.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pp. 9118–9147. PMLR, 2022a.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022b.
- Prioritized level replay. In International Conference on Machine Learning, pp. 4940–4950. PMLR, 2021.
- Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
- Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft. arXiv preprint arXiv:2106.14876, 2021.
- Housekeep: Tidying virtual households using commonsense reasoning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIX, pp. 355–373. Springer, 2022.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1, pp. 2, 2019.
- Grimgep: learning progress for robust goal sampling in visual deep reinforcement learning. IEEE Transactions on Cognitive and Developmental Systems, 99:1–1, 2022.
- Reward design with language models. arXiv preprint arXiv:2303.00001, 2023.
- Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation, 19(2):189–223, 2011a.
- Evolving a diversity of virtual creatures through novelty search and local competition. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, pp. 211–218, 2011b.
- Beyond open-endedness: Quantifying impressiveness. In ALIFE 2012: The Thirteenth International Conference on the Synthesis and Simulation of Living Systems, pp. 75–82. MIT Press, 2012.
- The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities. Artificial life, 26(2):274–306, 2020.
- A systematic investigation of commonsense knowledge in large language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 11838–11855, 2022.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Language conditioned imitation learning over unstructured data. arXiv preprint arXiv:2005.07648, 2020.
- Teacher–student curriculum learning. IEEE transactions on neural networks and learning systems, 31(9):3732–3740, 2019.
- Alan: Autonomously exploring robotic agents in the real world. arXiv preprint arXiv:2302.06604, 2023.
- Recent advances in natural language processing via large pre-trained language models: A survey. arXiv preprint arXiv:2111.01243, 2021.
- Jean-Baptiste Mouret. Novelty-based multiobjectivization. In New Horizons in Evolutionary Robotics: Extended Contributions from the 2009 EvoDeRob Workshop, pp. 139–154. Springer, 2011.
- Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909, 2015.
- Innovation engines: Automated creativity and improved stochastic optimization via deep learning. In Proceedings of the 2015 annual conference on genetic and evolutionary computation, pp. 959–966, 2015.
- OpenAI. GPT-4 Technical Report. ArXiv, abs/2303.08774, 2023.
- Asymmetric self-play for automatic goal discovery in robotic manipulation. arXiv preprint arXiv:2101.04882, 2021.
- Randomized prior functions for deep reinforcement learning. Advances in Neural Information Processing Systems, 31, 2018.
- Intrinsic motivation systems for autonomous mental development. IEEE transactions on evolutionary computation, 11(2):265–286, 2007.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pp. 2778–2787. PMLR, 2017.
- Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Teacher algorithms for curriculum learning of deep rl in continuously parameterized environments. In Conference on Robot Learning, pp. 835–853. PMLR, 2020.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Zero-shot text-to-image generation. In International Conference on Machine Learning, pp. 8821–8831. PMLR, 2021.
- High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Skill induction and planning with latent language. arXiv preprint arXiv:2110.01517, 2021.
- Why Greatness Cannot Be Planned: The Myth of the Objective. Springer Publishing Company, Incorporated, 2015. ISBN 3319155237.
- Open-endedness: The Last Grand Challenge You’ve Never Heard Of. https://www.oreilly.com/radar/open-endedness-the-last-grand-challenge-youve-never-heard-of/, May 2023. Accessed: 2023-05-15.
- Potter Stewart. Jacobellis v. Ohio, 378 U.S. 184. United States Supreme Court, 1964.
- Marilyn Strathern. ‘Improving ratings’: audit in the British University system. European review, 5(3):305–321, 1997.
- Perceptive Locomotion with Controllable Pace and Natural Gait Transitions Over Uneven Terrains. arXiv preprint arXiv:2301.10894, 2023.
- Human-Timescale Adaptation in an Open-Ended Task Space. arXiv preprint arXiv:2301.07608, 2023.
- Classification of global catastrophic risks connected with artificial intelligence. AI & Society, 35(1):147–163, 2020.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
- Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753, 2019.
- Enhanced poet: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In International Conference on Machine Learning, pp. 9940–9951. PMLR, 2020.
- Foundation models for decision making: Problems, methods, and opportunities. arXiv preprint arXiv:2303.04129, 2023.
- Automatic curriculum learning through value disagreement. Advances in Neural Information Processing Systems, 33:7648–7659, 2020.