Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
164 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

One-Shot Averaging for Distributed TD($λ$) Under Markov Sampling (2403.08896v2)

Published 13 Mar 2024 in cs.LG and cs.DC

Abstract: We consider a distributed setup for reinforcement learning, where each agent has a copy of the same Markov Decision Process but transitions are sampled from the corresponding Markov chain independently by each agent. We show that in this setting, we can achieve a linear speedup for TD($\lambda$), a family of popular methods for policy evaluation, in the sense that $N$ agents can evaluate a policy $N$ times faster provided the target accuracy is small enough. Notably, this speedup is achieved by ``one shot averaging,'' a procedure where the agents run TD($\lambda$) with Markov sampling independently and only average their results after the final step. This significantly reduces the amount of communication required to achieve a linear speedup relative to previous work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pp.  1691–1692. PMLR, 2018.
  2. Finite-time analysis of distributed TD(0) with linear function approximation on multi-agent reinforcement learning. In International Conference on Machine Learning, pp.  1626–1635. PMLR, 2019.
  3. FR Gantmacher. The theory of matrices. New York, 1964.
  4. Federated reinforcement learning: Linear speedup under Markovian sampling. In International Conference on Machine Learning, pp.  10997–11057. PMLR, 2022.
  5. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
  6. Temporal difference learning as gradient splitting. In International Conference on Machine Learning, pp.  6905–6913. PMLR, 2021.
  7. Distributed TD(0) with almost no communication. IEEE Control Systems Letters, 2023.
  8. Yann Ollivier. Approximate temporal difference learning is a gradient descent for reversible policies. arXiv preprint arXiv:1805.00869, 2018.
  9. Finite-time analysis of decentralized temporal-difference learning with linear function approximation. In International Conference on Artificial Intelligence and Statistics, pp.  4485–4495. PMLR, 2020.
  10. On the performance of temporal difference learning with neural networks. In The Eleventh International Conference on Learning Representations, 2022.
  11. Analysis of temporal-diffference learning with function approximation. Advances in Neural Information Processing Systems, 9, 1996.
  12. Decentralized TD tracking with linear function approximation and its finite-time analysis. Advances in Neural Information Processing Systems, 33:13762–13772, 2020.
  13. Finite-time analysis of on-policy heterogeneous federated reinforcement learning. arXiv preprint arXiv:2401.15273, 2024.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com