Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 14 tok/s Pro
GPT-4o 90 tok/s Pro
Kimi K2 179 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Continuous-time q-learning for mean-field control problems (2306.16208v4)

Published 28 Jun 2023 in cs.LG, math.OC, and q-fin.CP

Abstract: This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by $q$) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by $q_e$) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. R. Carmona and F. Delarue (2018a): Probabilistic Theory of Mean Field Games with Applications, Vol I. Springer.
  2. R. Carmona and F. Delarue (2018b): Probabilistic Theory of Mean Field Games with Applications, Vol II. Springer.
  3. K. Doya (2020). Reinforcement learning in continuous time and space. Neural Computation, 12(1):219–245.
  4. Y. Jia and X. Y. Zhou (2022a): Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. Journal of Machine Learning Research. 23, 1-50.
  5. Y. Jia and X. Y. Zhou (2022b): Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach. Journal of Machine Learning Research. 23, 1-55.
  6. Y. Jia and X. Y. Zhou (2023): q-learning in continuous time. Journal of Machine Learning Research. 24, 1-61.
  7. D. Lacker (2017): Limit theory for controlled McKean-Vlasov dynamics. SIAM Journal on Control and Optimization, 55(3):1641-1672.
  8. P.-L. Lions (2006): Cours au collège de france: Théorie des jeux à champ moyens. Audio Conference.
  9. M. Motte and H. Pham (2022): Mean-field Markov decision processes with common noise and open-loop controls. The Annals of Applied Probability, 32(2):1421-1458.
  10. J. Schulman, X. Chen and P. Abbeel (2017): Equivalence between policy gradients and soft Q-learning. Preprint arXiv:1704.06440.
  11. R. S. Sutton and A. G. Barto (2018): Reinforcement learning: An introduction. MIT press.
Citations (4)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Authors (2)

X Twitter Logo Streamline Icon: https://streamlinehq.com