Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CANDID DAC: Leveraging Coupled Action Dimensions with Importance Differences in DAC (2407.05789v2)

Published 8 Jul 2024 in cs.LG and cs.AI

Abstract: High-dimensional action spaces remain a challenge for dynamic algorithm configuration (DAC). Interdependencies and varying importance between action dimensions are further known key characteristics of DAC problems. We argue that these Coupled Action Dimensions with Importance Differences (CANDID) represent aspects of the DAC problem that are not yet fully explored. To address this gap, we introduce a new white-box benchmark within the DACBench suite that simulates the properties of CANDID. Further, we propose sequential policies as an effective strategy for managing these properties. Such policies factorize the action space and mitigate exponential growth by learning a policy per action dimension. At the same time, these policies accommodate the interdependence of action dimensions by fostering implicit coordination. We show this in an experimental study of value-based policies on our new benchmark. This study demonstrates that sequential policies significantly outperform independent learning of factorized policies in CANDID action spaces. In addition, they overcome the scalability limitations associated with learning a single policy across all action dimensions. The code used for our experiments is available under https://github.com/PhilippBordne/candidDAC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Automated dynamic algorithm configuration. Journal of Artificial Intelligence Research (JAIR), 75:1633–1699.
  2. Deep reinforcement learning at the edge of the statistical precipice. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W., editors, Proceedings of the 34th Annual Conference on Neural Information Processing Systems (NeurIPS’21), pages 29304–29320.
  3. Dynamic algorithm configuration: Foundation of a new meta-algorithmic framework. In Giacomo, G. D., Catalá, A., Dilkina, B., Milano, M., Barro, S., Bugarín, A., and Lang, J., editors, Proceedings of the 24th European Conference on Artificial Intelligence (ECAI’20), volume 325 of Frontiers in Artificial Intelligence and Applications, pages 427–434. IOS Press.
  4. Theory-inspired parameter control benchmarks for dynamic algorithm configuration. In Fieldsend, J. E. and Wagner, M., editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’22), pages 766–775. ACM.
  5. CAVE: configuration assessment, visualization and evaluation. In Battiti, R., Brunato, M., Kotsireas, I. S., and Pardalos, P. M., editors, Proceedings of the 12th International Conference on Learning and Intelligent Optimization (LION’18), volume 11353 of Lecture Notes in Computer Science, pages 115–130. Springer.
  6. Dacbench: A benchmark library for dynamic algorithm configuration. In Zhou, Z., editor, Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI’21), pages 1668–1674. ijcai.org.
  7. Oracles & followers: Stackelberg equilibria in deep multi-agent reinforcement learning. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J., editors, Proceedings of the International Conference on Machine Learning ICML’23, volume 202 of Proceedings of Machine Learning Research, pages 11213–11236. PMLR.
  8. One policy to control them all: Shared modular policies for agent-agnostic control. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), volume 119 of Proceedings of Machine Learning Research, pages 4455–4464. PMLR.
  9. Multi-agent reinforcement learning: A comprehensive survey. arXiv:2312.10256 [cs.MA].
  10. An efficient approach for assessing hyperparameter importance. In Proceedings of the 31th International Conference on Machine Learning (ICML’14), volume 32 of JMLR Workshop and Conference Proceedings, pages 754–762. JMLR.org.
  11. Essentials of Game Theory: A Concise Multidisciplinary Introduction. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers.
  12. Discrete sequential prediction of continuous actions for deep RL. arXiv:1705.05035 [cs.LG].
  13. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533.
  14. AutoRL hyperparameter landscapes. In Faust, A., Garnett, R., White, C., Hutter, F., and Gardner, J. R., editors, Proceedings of the International Conference on Automated Machine Learning (AutoML’23), volume 224 of Proceedings of Machine Learning Research, pages 13/1–27. PMLR.
  15. Explaining hyperparameter optimization via partial dependence plots. In Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P., and Vaughan, J. W., editors, Proceedings of the 34th Annual Conference on Neural Information Processing Systems (NeurIPS’21), pages 2280–2291.
  16. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In Dy, J. G. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning (ICML’18), volume 80 of Proceedings of Machine Learning Research, pages 4292–4301. PMLR.
  17. Learning to factor policies and action-value functions: Factored action space representations for deep reinforcement learning. arXiv:1705.07269 [cs.LG].
  18. Value-decomposition networks for cooperative multi-agent learning based on team reward. In André, E., Koenig, S., Dastani, M., and Sukthankar, G., editors, Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS’18), pages 2085–2087. International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, USA / ACM.
  19. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2 edition.
  20. Multiagent cooperation and competition with deep reinforcement learning. arXiv:1511.08779 [cs.AI].
  21. Interactive effect of learning rate and batch size to implement transfer learning for brain tumor classification. Electronics, 12(4).
  22. Deep reinforcement learning with double q-learning. In Schuurmans, D. and Wellman, M., editors, Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI’16), page 2094–2100. AAAI Press.
  23. Hyperparameter importance across datasets. In Guo, Y. and Farooq, F., editors, Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD’18), pages 2367–2376. ACM.
  24. Q-learning. Machine Learning, 8(3):279–292.
  25. Multi-agent dynamic algorithm configuration. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A., editors, Proceedings of the 35th Annual Conference on Neural Information Processing Systems (NeurIPS’22).

Summary

We haven't generated a summary for this paper yet.