Matrix Low-Rank Approximation For Policy Gradient Methods (2405.17626v1)
Abstract: Estimating a policy that maps states to actions is a central problem in reinforcement learning. Traditionally, policies are inferred from the so called value functions (VFs), but exact VF computation suffers from the curse of dimensionality. Policy gradient (PG) methods bypass this by learning directly a parametric stochastic policy. Typically, the parameters of the policy are estimated using neural networks (NNs) tuned via stochastic gradient descent. However, finding adequate NN architectures can be challenging, and convergence issues are common as well. In this paper, we put forth low-rank matrix-based models to estimate efficiently the parameters of PG algorithms. We collect the parameters of the stochastic policy into a matrix, and then, we leverage matrix-completion techniques to promote (enforce) low rank. We demonstrate via numerical studies how low-rank matrix-based policy models reduce the computational and sample complexities relative to NN models, while achieving a similar aggregated reward.
- K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017.
- C. Eckart and G. Young, “The approximation of one matrix by another of lower rank,” Psychometrika, vol. 1, no. 3, pp. 211–218, 1936.
- M. Udell, C. Horn, R. Zadeh, S. Boyd et al., “Generalized low rank models,” Foundations and Trends® in Machine Learning, vol. 9, no. 1, pp. 1–118, 2016.
- M. Mardani, G. Mateos, and G. B. Giannakis, “Decentralized sparsity-regularized rank minimization: Algorithms and applications,” IEEE Trans. Signal Processing, vol. 61, no. 21, pp. 5374–5388, 2013.
- E. Tolstaya, A. Koppel, E. Stump, and A. Ribeiro, “Nonparametric stochastic compositional gradient descent for q-learning in continuous markov decision problems,” in 2018 Annual American Control Conference (ACC). IEEE, 2018, pp. 6608–6615.
- G. Lever, J. Shawe-Taylor, R. Stafford, and C. Szepesvári, “Compressed conditional mean embeddings for model-based reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1, 2016.
- A. M. Barreto, R. L. Beirigo, J. Pineau, and D. Precup, “Incremental stochastic factorization for online reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, 2016.
- N. Jiang, A. Krishnamurthy, A. Agarwal, J. Langford, and R. E. Schapire, “Contextual decision processes with low bellman rank are pac-learnable,” in Proc. Intl. Conf. Machine Learning-Volume 70. JMLR. org, 2017, pp. 1704–1713.
- A. Mahajan, M. Samvelyan, L. Mao, V. Makoviychuk, A. Garg, J. Kossaifi, S. Whiteson, Y. Zhu, and A. Anandkumar, “Tesseract: Tensorised actors for multi-agent reinforcement learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 7301–7312.
- H. Y. Ong, “Value function approximation via low-rank models,” arXiv preprint arXiv:1509.00061, 2015.
- B. Cheng and W. B. Powell, “Co-optimizing battery storage for the frequency regulation and energy arbitrage using multi-scale dynamic programming,” IEEE Trans. Smart Grid, vol. 9.3, pp. 1997–2005, 2016.
- B. Cheng, T. Asamov, and W. B. Powell, “Low-rank value function approximation for co-optimization of battery storage,” IEEE Trans. Smart Grid, vol. 9.6, pp. 6590–6598, 2017.
- F. S. Melo and M. I. Ribeiro, “Q-learning with linear function approximation,” in Intl. Conf. Comp. Learning Theory. Springer, 2007, pp. 308–322.
- B. Behzadian, S. Gharatappeh, and M. Petrik, “Fast feature selection for linear value function approximation,” in Proc. Intl. Conf. Automated Planning and Scheduling, vol. 29, no. 1, 2019, pp. 601–609.
- Y. Yang, G. Zhang, Z. Xu, and D. Katabi, “Harnessing structures for value-based planning and reinforcement learning,” in International Conference on Learning Representations, 2020.
- D. Shah, D. Song, Z. Xu, and Y. Yang, “Sample efficient reinforcement learning via low-rank matrix estimation,” in Proc. of the 34th International Conference on Neural Information Processing Systems, ser. NIPS’20. Red Hook, NY, USA: Curran Associates Inc., 2020.
- S. Rozada, V. Tenorio, and A. G. Marques, “Low-rank state-action value-function approximation,” in 2021 29th European Signal Processing Conference (EUSIPCO), 2021, pp. 1471–1475.
- S. Rozada and A. G. Marques, “Tensor and matrix low-rank value-function approximation in reinforcement learning,” arXiv preprint arXiv:2201.09736, 2022.
- C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3-4, pp. 279–292, 1992.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing Atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
- R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine learning, vol. 8, no. 3, pp. 229–256, 1992.
- J. Peters and S. Schaal, “Natural actor-critic,” Neurocomputing, vol. 71, no. 7-9, pp. 1180–1190, 2008.
- D. Lee, N. He, P. Kamalaruban, and V. Cevher, “Optimization for reinforcement learning: From a single agent to cooperative agents,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 123–135, 2020.
- R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” Advances in neural information processing systems, vol. 12, 1999.
- E. Greensmith, P. L. Bartlett, and J. Baxter, “Variance reduction techniques for gradient estimates in reinforcement learning.” Journal of Machine Learning Research, vol. 5, no. 9, 2004.
- S. Levine and P. Abbeel, “Learning neural network policies with guided policy search under unknown dynamics,” Advances in neural information processing systems, vol. 27, 2014.
- K. Ciosek and S. Whiteson, “Expected policy gradients,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI Gym,” arXiv preprint arXiv:1606.01540, 2016.
- S. Rozada, “Online code repository: Matrix low-rank approximation for policy gradient methods,” https://github.com/sergiorozada12/matrix-low-rank-pg, 2022.
- Sergio Rozada (12 papers)
- Antonio G. Marques (78 papers)