Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Matrix Low-Rank Trust Region Policy Optimization (2405.17625v1)

Published 27 May 2024 in cs.LG and cs.AI

Abstract: Most methods in reinforcement learning use a Policy Gradient (PG) approach to learn a parametric stochastic policy that maps states to actions. The standard approach is to implement such a mapping via a neural network (NN) whose parameters are optimized using stochastic gradient descent. However, PG methods are prone to large policy updates that can render learning inefficient. Trust region algorithms, like Trust Region Policy Optimization (TRPO), constrain the policy update step, ensuring monotonic improvements. This paper introduces low-rank matrix-based models as an efficient alternative for estimating the parameters of TRPO algorithms. By gathering the stochastic policy's parameters into a matrix and applying matrix-completion techniques, we promote and enforce low rank. Our numerical studies demonstrate that low-rank matrix-based policy models effectively reduce both computational and sample complexities compared to NN models, while maintaining comparable aggregated rewards.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Reinforcement learning: An introduction, MIT press, 2018.
  2. D. P. Bertsekas, Reinforcement learning and optimal control, Athena Scientific, 2019.
  3. “Policy gradient methods for reinforcement learning with function approximation,” Advances in Neural Information Processing Systems, vol. 12, 1999.
  4. S. M. Kakade, “A natural policy gradient,” Advances in Neural Information Processing Systems, vol. 14, 2001.
  5. “Trust region policy optimization,” in Intl. Conf. Machine Learning. PMLR, 2015, pp. 1889–1897.
  6. “Deep reinforcement learning: A brief survey,” IEEE Signal Process. Mag., vol. 34, no. 6, pp. 26–38, 2017.
  7. C. Eckart and G. Young, “The approximation of one matrix by another of lower rank,” Psychometrika, vol. 1, no. 3, pp. 211–218, 1936.
  8. I. Markovsky, Low rank approximation: Algorithms, implementation, applications, vol. 906, Springer, 2012.
  9. “Generalized low rank models,” Foundations and Trends® in Machine Learning, vol. 9, no. 1, pp. 1–118, 2016.
  10. “Decentralized sparsity-regularized rank minimization: Algorithms and applications,” IEEE Trans. Signal Process., vol. 61, no. 21, pp. 5374–5388, 2013.
  11. “Nonparametric stochastic compositional gradient descent for q-learning in continuous markov decision problems,” in Annual American Control Conf. (ACC). IEEE, 2018, pp. 6608–6615.
  12. “Compressed conditional mean embeddings for model-based reinforcement learning,” in AAAI Conf. Artificial Intelligence, 2016, vol. 30.
  13. “Sparse variational deterministic policy gradient for continuous real-time control,” IEEE Trans. Industrial Electronics, vol. 68, no. 10, pp. 9800–9810, 2020.
  14. “Q-learning with linear function approximation,” in Intl. Conf. Computational Learning Theory. Springer, 2007, pp. 308–322.
  15. “Fast feature selection for linear value function approximation,” in Intl. Conf. Automated Planning and Scheduling, 2019, vol. 29, pp. 601–609.
  16. “Incremental stochastic factorization for online reinforcement learning,” in AAAI Conf. Artificial Intelligence, 2016.
  17. “Contextual decision processes with low Bellman rank are PAC-learnable,” in Intl. Conf. Machine Learning. JMLR. org, 2017, pp. 1704–1713.
  18. “Tesseract: Tensorised actors for multi-agent reinforcement learning,” in Intl. Conf. Machine Learning. PMLR, 2021, pp. 7301–7312.
  19. H. Y. Ong, “Value function approximation via low-rank models,” arXiv preprint arXiv:1509.00061, 2015.
  20. “Factorized q-learning for large-scale multi-agent systems,” in Intl. Conf. Distributed Artificial Intelligence (DAI), 2019.
  21. “A novel rank selection scheme in tensor ring decomposition based on reinforcement learning for deep neural networks,” in IEEE Intl. Conf. Acoustics, Speech Signal Process. (ICASSP). IEEE, 2020, pp. 3292–3296.
  22. B. Cheng and W. B. Powell, “Co-optimizing battery storage for the frequency regulation and energy arbitrage using multi-scale dynamic programming,” IEEE Trans. Smart Grid, vol. 9.3, pp. 1997–2005, 2016.
  23. “Low-rank value function approximation for co-optimization of battery storage,” IEEE Trans. Smart Grid, vol. 9.6, pp. 6590–6598, 2017.
  24. “Sample efficient reinforcement learning via low-rank matrix estimation,” in Intl. Conf. Neural Information Processing Systems (NIPS), 2020.
  25. “Low-rank state-action value-function approximation,” in European Signal Process. Conf. (EUSIPCO), 2021, pp. 1471–1475.
  26. “Overcoming the long horizon barrier for sample-efficient reinforcement learning with latent low-rank structure,” ACM on Measurement and Analysis of Computing Systems, vol. 7, no. 2, pp. 1–60, 2023.
  27. S. Rozada and A. G. Marques, “Tensor and matrix low-rank value-function approximation in reinforcement learning,” arXiv preprint arXiv:2201.09736, 2022.
  28. S. Rozada and A. G. Marques, “Matrix low-rank approximation for policy gradient methods,” in IEEE Intl. Conf. Acoustics, Speech Signal Process. (ICASSP). IEEE, 2023, pp. 1–5.
  29. “High-dimensional continuous control using generalized advantage estimation,” arXiv preprint arXiv:1506.02438, 2015.
  30. S. Amari, Differential-geometrical methods in statistics, vol. 28, Springer Science & Business Media, 2012.
  31. K. Ciosek and S. Whiteson, “Expected policy gradients,” in AAAI Conf. Artificial Intelligence, 2018, vol. 32.
  32. “OpenAI Gym,” arXiv preprint arXiv:1606.01540, 2016.
  33. S. Rozada, “Online code repository: Matrix low-rank trust region policy optimization,” https://github.com/sergiorozada12/matrix-low-rank-trpo, 2023.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 1 like.

Upgrade to Pro to view all of the tweets about this paper: