One-Shot Averaging for Distributed TD($λ$) Under Markov Sampling (2403.08896v2)
Abstract: We consider a distributed setup for reinforcement learning, where each agent has a copy of the same Markov Decision Process but transitions are sampled from the corresponding Markov chain independently by each agent. We show that in this setting, we can achieve a linear speedup for TD($\lambda$), a family of popular methods for policy evaluation, in the sense that $N$ agents can evaluate a policy $N$ times faster provided the target accuracy is small enough. Notably, this speedup is achieved by ``one shot averaging,'' a procedure where the agents run TD($\lambda$) with Markov sampling independently and only average their results after the final step. This significantly reduces the amount of communication required to achieve a linear speedup relative to previous work.
- A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pp. 1691–1692. PMLR, 2018.
- Finite-time analysis of distributed TD(0) with linear function approximation on multi-agent reinforcement learning. In International Conference on Machine Learning, pp. 1626–1635. PMLR, 2019.
- FR Gantmacher. The theory of matrices. New York, 1964.
- Federated reinforcement learning: Linear speedup under Markovian sampling. In International Conference on Machine Learning, pp. 10997–11057. PMLR, 2022.
- Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
- Temporal difference learning as gradient splitting. In International Conference on Machine Learning, pp. 6905–6913. PMLR, 2021.
- Distributed TD(0) with almost no communication. IEEE Control Systems Letters, 2023.
- Yann Ollivier. Approximate temporal difference learning is a gradient descent for reversible policies. arXiv preprint arXiv:1805.00869, 2018.
- Finite-time analysis of decentralized temporal-difference learning with linear function approximation. In International Conference on Artificial Intelligence and Statistics, pp. 4485–4495. PMLR, 2020.
- On the performance of temporal difference learning with neural networks. In The Eleventh International Conference on Learning Representations, 2022.
- Analysis of temporal-diffference learning with function approximation. Advances in Neural Information Processing Systems, 9, 1996.
- Decentralized TD tracking with linear function approximation and its finite-time analysis. Advances in Neural Information Processing Systems, 33:13762–13772, 2020.
- Finite-time analysis of on-policy heterogeneous federated reinforcement learning. arXiv preprint arXiv:2401.15273, 2024.