Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold (2308.10547v3)
Abstract: The conjugate gradient method is a crucial first-order optimization method that generally converges faster than the steepest descent method, and its computational cost is much lower than that of second-order methods. However, while various types of conjugate gradient methods have been studied in Euclidean spaces and on Riemannian manifolds, there is little study for those in distributed scenarios. This paper proposes a decentralized Riemannian conjugate gradient descent (DRCGD) method that aims at minimizing a global function over the Stiefel manifold. The optimization problem is distributed among a network of agents, where each agent is associated with a local function, and the communication between agents occurs over an undirected connected graph. Since the Stiefel manifold is a non-convex set, a global function is represented as a finite sum of possibly non-convex (but smooth) local functions. The proposed method is free from expensive Riemannian geometric operations such as retractions, exponential maps, and vector transports, thereby reducing the computational complexity required by each agent. To the best of our knowledge, DRCGD is the first decentralized Riemannian conjugate gradient algorithm to achieve global convergence over the Stiefel manifold.
- Optimization algorithms on matrix manifolds. Princeton University Press, 2008.
- Mehiddin Al-Baali. Descent property and global convergence of the fletcher—reeves method with inexact line search. IMA Journal of Numerical Analysis, 5(1):121–124, 1985.
- Decentralized proximal gradient algorithms with linear convergence rates. IEEE Transactions on Automatic Control, 66(6):2787–2794, 2020.
- Distributed linearized alternating direction method of multipliers for composite convex consensus optimization. IEEE Transactions on Automatic Control, 63(1):5–20, 2017.
- MV Balashov and RA Kamalov. The gradient projection method with armijo’s step size on manifolds. Computational Mathematics and Mathematical Physics, 61:1776–1786, 2021.
- Global rates of convergence for nonconvex optimization on manifolds. IMA Journal of Numerical Analysis, 39(1):1–33, 2019.
- Fastest mixing markov chain on a graph. SIAM review, 46(4):667–689, 2004.
- Decentralized riemannian gradient descent on the stiefel manifold. In International Conference on Machine Learning, pp. 1594–1605. PMLR, 2021.
- Proximal smoothness and the lower-c2 property. J. Convex Anal, 2(1-2):117–144, 1995.
- A nonlinear conjugate gradient method with a strong global convergence property. SIAM Journal on optimization, 10(1):177–182, 1999.
- Decentralized projected riemannian gradient method for smooth optimization on compact submanifolds. arXiv preprint arXiv:2304.08241, 2023.
- Geometric bounds for eigenvalues of markov chains. The annals of applied probability, pp. 36–61, 1991.
- On conjugate gradient-like methods for eigen-like problems. BIT Numerical Mathematics, 36(3):494–508, 1996.
- The geometry of algorithms with orthogonality constraints. SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998.
- Understanding how orthogonality of parameters improves quantization of neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Function minimization by conjugate gradients. The computer journal, 7(2):149–154, 1964.
- Roger Fletcher. Practical methods of optimization. John Wiley & Sons, 2000.
- Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49(6):409–436, 1952.
- Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Numerical optimization. Spinger, 2006.
- Yann LeCun. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
- Y Liu and C Storey. Efficient generalized conjugate gradient algorithms, part 1: theory. Journal of optimization theory and applications, 69:129–137, 1991.
- Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1):48–61, 2009.
- Constrained consensus and optimization in multi-agent networks. IEEE Transactions on Automatic Control, 55(4):922–938, 2010.
- Network topology and communication-computation tradeoffs in decentralized optimization. Proceedings of the IEEE, 106(5):953–976, 2018.
- Y Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87. Springer Science & Business Media, 2013.
- Numerical optimization. Springer, 1999.
- Note sur la convergence de méthodes de directions conjuguées. Revue française d’informatique et de recherche opérationnelle. Série rouge, 3(16):35–43, 1969.
- Boris Teodorovich Polyak. The conjugate gradient method in extremal problems. USSR Computational Mathematics and Mathematical Physics, 9(4):94–112, 1969.
- Guannan Qu and Na Li. Harnessing smoothness to accelerate distributed optimization. IEEE Transactions on Control of Network Systems, 5(3):1245–1260, 2017.
- Cloud k-svd: A collaborative dictionary learning algorithm for big, distributed data. IEEE Transactions on Signal Processing, 64(1):173–188, 2015.
- Optimization methods on riemannian manifolds and their application to shape space. SIAM Journal on Optimization, 22(2):596–627, 2012.
- Hybrid riemannian conjugate gradient methods with global convergence properties. Computational Optimization and Applications, 77:811–830, 2020.
- Sufficient descent riemannian conjugate gradient methods. Journal of Optimization Theory and Applications, 190(1):130–150, 2021.
- Consensus optimization on manifolds. SIAM journal on Control and Optimization, 48(1):56–76, 2009.
- Hiroyuki Sato. Riemannian optimization and its applications, volume 670. Springer, 2021.
- Hiroyuki Sato. Riemannian conjugate gradient methods: General framework and specific algorithms with convergence analyses. SIAM Journal on Optimization, 32(4):2690–2717, 2022.
- A new, globally convergent riemannian conjugate gradient method. Optimization, 64(4):1011–1031, 2015.
- Suhail M Shah. Distributed optimization on riemannian manifolds for multi-agent networks. arXiv preprint arXiv:1711.11196, 2017.
- On the linear convergence of the admm in decentralized consensus optimization. IEEE Transactions on Signal Processing, 62(7):1750–1761, 2014.
- Extra: An exact first-order algorithm for decentralized consensus optimization. SIAM Journal on Optimization, 25(2):944–966, 2015.
- Steven Smith. Optimization techniques on riemannian manifolds. Hamiltonian and Gradient Flows, Algorithms and Control, pp. 113–136, 1995.
- On orthogonality and learning recurrent networks with long term dependencies. In International Conference on Machine Learning, pp. 3570–3578. PMLR, 2017.
- Decentralized optimization over the stiefel manifold by an approximate augmented lagrangian function. IEEE Transactions on Signal Processing, 70:3029–3041, 2022.
- Deepca: Decentralized exact pca with linear convergence rate. The Journal of Machine Learning Research, 22(1):10777–10803, 2021.
- On the convergence of decentralized gradient descent. SIAM Journal on Optimization, 26(3):1835–1854, 2016.
- Exact diffusion for distributed optimization and learning—part i: Algorithm development. IEEE Transactions on Signal Processing, 67(3):708–723, 2018.
- On nonconvex decentralized gradient descent. IEEE Transactions on signal processing, 66(11):2834–2848, 2018.
- Xiaojing Zhu. A riemannian conjugate gradient method for optimization on the stiefel manifold. Computational optimization and Applications, 67:73–110, 2017.
- Riemannian conjugate gradient methods with inverse retraction. Computational Optimization and Applications, 77:779–810, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.