Enhancing MAP-Elites with Multiple Parallel Evolution Strategies (2303.06137v2)
Abstract: With the development of fast and massively parallel evaluations in many domains, Quality-Diversity (QD) algorithms, that already proved promising in a large range of applications, have seen their potential multiplied. However, we have yet to understand how to best use a large number of evaluations as using them for random variations alone is not always effective. High-dimensional search spaces are a typical situation where random variations struggle to effectively search. Another situation is uncertain settings where solutions can appear better than they truly are and naively evaluating more solutions might mislead QD algorithms. In this work, we propose MAP-Elites-Multi-ES (MEMES), a novel QD algorithm based on Evolution Strategies (ES) designed to exploit fast parallel evaluations more effectively. MEMES maintains multiple (up to 100) simultaneous ES processes, each with its own independent objective and reset mechanism designed for QD optimisation, all on just a single GPU. We show that MEMES outperforms both gradient-based and mutation-based QD algorithms on black-box optimisation and QD-Reinforcement-Learning tasks, demonstrating its benefit across domains. Additionally, our approach outperforms sampling-based QD methods in uncertain domains when given the same evaluation budget. Overall, MEMES generates reproducible solutions that are high-performing and diverse through large-scale ES optimisation on easily accessible hardware.
- Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine 34, 6 (2017), 26–38.
- Hans-Georg Beyer and Hans-Paul Schwefel. 2002. Evolution strategies–a comprehensive introduction. Natural computing 1 (2002), 3–52.
- Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX. arXiv:2306.09884 [cs.LG] https://arxiv.org/abs/2306.09884
- QDax: A Library for Quality-Diversity and Population-based Algorithms with Hardware Acceleration. arXiv preprint arXiv:2308.03665 (2023).
- Assessing Quality-Diversity Neuro-Evolution Algorithms Performance in Hard Exploration Problems. (2022). https://doi.org/10.48550/ARXIV.2211.13742
- Reset-free trial-and-error learning for robot damage recovery. Robotics and Autonomous Systems 100 (2018), 236–250.
- Scaling map-elites to deep neuroevolution. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 67–75.
- Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. Advances in neural information processing systems 31 (2018).
- Antoine Cully. 2021. Multi-emitter map-elites: improving quality, diversity and data efficiency with heterogeneous sets of emitters. In Proceedings of the Genetic and Evolutionary Computation Conference. 84–92.
- Robots that can adapt like animals. Nature 521, 7553 (2015), 503–507.
- Antoine Cully and Yiannis Demiris. 2017. Quality and diversity optimization: A unifying modular framework. IEEE Transactions on Evolutionary Computation 22, 2 (2017), 245–259.
- Antoine Cully and Yiannis Demiris. 2018. Hierarchical behavioral repertoires with unsupervised descriptors. In Proceedings of the Genetic and Evolutionary Computation Conference. 69–76.
- Antoine Cully and Jean-Baptiste Mouret. 2013. Behavioral repertoire learning in robotics. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. 175–182.
- First return, then explore. Nature 590, 7847 (2021), 580–586.
- Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain Domains. ACM Transactions on Evolutionary Learning (2022).
- Empirical analysis of PGA-MAP-Elites for Neuroevolution in Uncertain Domains. ACM Transactions on Evolutionary Learning 3, 1 (2023), 1–32.
- Manon Flageat and Antoine Cully. 2020. Fast and stable MAP-Elites in noisy domains using deep grids. In Artificial Life Conference Proceedings 32. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info …, 273–282.
- Manon Flageat and Antoine Cully. 2023. Uncertain Quality-Diversity: Evaluation methodology and new methods for Quality-Diversity in Uncertain Domains. IEEE Transactions on Evolutionary Computation (2023).
- Benchmarking Quality-Diversity Algorithms on Neuroevolution for Reinforcement Learning. https://doi.org/10.48550/ARXIV.2211.02193
- Matthew Fontaine and Stefanos Nikolaidis. 2021. Differentiable Quality Diversity. Advances in Neural Information Processing Systems 34 (2021).
- Covariance matrix adaptation for the rapid illumination of behavior space. In Proceedings of the 2020 genetic and evolutionary computation conference. 94–102.
- Brax - A Differentiable Physics Engine for Large Scale Rigid Body Simulation. http://github.com/google/brax
- Data-efficient design exploration through surrogate-assisted illumination. Evolutionary computation 26, 3 (2018), 381–410.
- Are quality diversity algorithms better at generating stepping stones than objective-based search?. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 115–116.
- Discovering representations for black-box optimization. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 103–111.
- Procedural content generation through quality diversity. In 2019 IEEE Conference on Games (CoG). IEEE, 1–8.
- Nikolaus Hansen. 2006. The CMA evolution strategy: a comparing review. Towards a new evolutionary computation (2006), 75–102.
- minimax: Efficient Baselines for Autocurricula in JAX. arXiv preprint arXiv:2311.12716 (2023).
- Map-elites for noisy domains by adaptive sampling. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 121–122.
- Robert Tjarko Lange. 2022. evosax: JAX-based Evolution Strategies. arXiv preprint arXiv:2212.04180 (2022).
- Joel Lehman and Kenneth O Stanley. 2011a. Evolving a diversity of virtual creatures through novelty search and local competition. In Proceedings of the 13th annual conference on Genetic and evolutionary computation. 211–218.
- Joel Lehman and Kenneth O Stanley. 2011b. Novelty search and the problem with objectives. In Genetic programming theory and practice IX. Springer, 37–56.
- Accelerated Quality-Diversity for Robotics through Massive Parallelism. arXiv preprint arXiv:2202.01258 (2022).
- Learning to walk autonomously via reset-free quality-diversity. In Proceedings of the Genetic and Evolutionary Computation Conference. 86–94.
- Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470 (2021).
- Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015).
- Olle Nilsson and Antoine Cully. 2021. Policy gradient assisted MAP-Elites. In Proceedings of the Genetic and Evolutionary Computation Conference. 866–875.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning. PMLR, 2778–2787.
- Diversity policy gradient for sample efficient quality-diversity optimization. In Proceedings of the Genetic and Evolutionary Computation Conference. 1075–1083.
- Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI 3 (2016), 40.
- JaxMARL: Multi-Agent RL Environments in JAX. arXiv preprint arXiv:2311.10090 (2023).
- Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
- EvoJAX: Hardware-Accelerated Neuroevolution. arXiv preprint arXiv:2202.05008 (2022).
- Training Diverse High-Dimensional Controllers by Scaling Covariance Matrix Adaptation MAP-Annealing. arXiv preprint arXiv:2210.02622 (2022).
- Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning. arXiv preprint arXiv:2202.03666 (2022).
- Vassilis Vassiliades and Jean-Baptiste Mouret. 2018. Discovering the Elite Hypervolume by Leveraging Interspecies Correlation. In GECCO 2018 - Genetic and Evolutionary Computation Conference. Kyoto, Japan. https://doi.org/10.1145/3205455.3205602
- Natural evolution strategies. The Journal of Machine Learning Research 15, 1 (2014), 949–980.